Critique-and-Routing Controller for Multi-Agent LLM Systems
A new critique-and-routing controller for multi-agent LLM systems treats coordination as a sequential decision problem, enabling iterative refinement of drafts rather than one-shot model selection. The controller evaluates drafts at each turn, deciding whether to stop or select another agent for improvement. It is formulated as a finite-horizon MDP with agent-utilization constraints, using a composite reward and policy gradients under a Lagrangian-relaxed objective. Extensive experiments demonstrate its effectiveness.
Key facts
- Proposes a critique-and-routing controller for multi-agent LLM systems
- Casts multi-agent coordination as a sequential decision problem
- Controller evaluates current draft at each turn
- Decides to stop or continue and selects next agent if needed
- Formulated as finite-horizon Markov Decision Process (MDP)
- Includes explicit agent-utilization constraints
- Composite reward designed for controller decisions across turns
- Optimized via policy gradients under Lagrangian-relaxed objective
Entities
Institutions
- arXiv