TRACER: Turn-Level Reinforcement Framework for Multi-LLM Reasoning
Researchers introduced TRACER, a turn-level reinforcement framework designed to improve cooperative reasoning among multiple large language models. The framework addresses challenges in multi-agent systems such as sparse rewards, role-level free-riding, excessive training overhead, imitation-only collaboration, and oscillating local optima. TRACER separates decision-making into a controller-regret layer and a generation-credit layer. In the controller-regret layer, controllers use regret matching to decide whether agents should speak or skip a turn. The generation-credit layer optimizes proposer and reviewer utterances using role-specific GSPO rewards. This approach assigns credit at both the action mode and utterance levels. The work was published on arXiv under ID 2605.28699.
Key facts
- TRACER is a turn-level reinforcement framework for cooperative multi-LLM reasoning.
- It addresses sparse rewards, role-level free-riding, and excessive training overhead.
- The framework separates decision-making into a controller-regret layer and a generation-credit layer.
- Controllers use regret matching to decide agent turn-taking.
- The generation-credit layer uses role-specific GSPO rewards.
- TRACER assigns credit at action mode and utterance levels.
- The paper is available on arXiv with ID 2605.28699.
- The framework aims to combine reinforcement learning and multi-agent prompting.
Entities
Institutions
- arXiv