Critique-and-Routing Controller for Multi-Agent LLM Systems

ai-technology · 2026-05-12

A new critique-and-routing controller for multi-agent LLM systems treats coordination as a sequential decision problem, enabling iterative refinement of drafts rather than one-shot model selection. The controller evaluates drafts at each turn, deciding whether to stop or select another agent for improvement. It is formulated as a finite-horizon MDP with agent-utilization constraints, using a composite reward and policy gradients under a Lagrangian-relaxed objective. Extensive experiments demonstrate its effectiveness.

Key facts

Proposes a critique-and-routing controller for multi-agent LLM systems
Casts multi-agent coordination as a sequential decision problem
Controller evaluates current draft at each turn
Decides to stop or continue and selects next agent if needed
Formulated as finite-horizon Markov Decision Process (MDP)
Includes explicit agent-utilization constraints
Composite reward designed for controller decisions across turns
Optimized via policy gradients under Lagrangian-relaxed objective

Critique-and-Routing Controller for Multi-Agent LLM Systems

Key facts

Entities

Institutions

Sources