Compound LLM Agent Design in Adversarial POMDP: Cost-Performance Study
A new study on arXiv examines the cost-performance trade-offs of compound LLM agent design in adversarial, partially observable environments. Using the CybORG CAGE-2 cyber defense simulation, researchers tested five model families, six models, and twelve configurations across 3,475 episodes. They varied context representation, deliberation methods (self-questioning, self-critique, self-improvement), and task decomposition. All configurations operated in failure-mitigation mode with non-positive rewards. The study provides token-level cost accounting to guide practitioners on which design choices improve performance versus merely increasing inference costs.
Key facts
- Study published on arXiv with ID 2605.16205
- Uses CybORG CAGE-2 cyber defense environment
- Environment modeled as Partially Observable Markov Decision Process (POMDP)
- Evaluates five model families and six models
- Tests twelve configurations across 3,475 episodes
- Varies context representation: raw observations vs. deterministic state-tracking with compressed history
- Deliberation includes self-questioning, self-critique, and self-improvement tools
- All configurations have non-positive rewards (failure-mitigation mode)
Entities
Institutions
- arXiv