TRACER: Turn-Level Reinforcement Framework for Multi-LLM Reasoning

ai-technology · 2026-05-28

Researchers introduced TRACER, a turn-level reinforcement framework designed to improve cooperative reasoning among multiple large language models. The framework addresses challenges in multi-agent systems such as sparse rewards, role-level free-riding, excessive training overhead, imitation-only collaboration, and oscillating local optima. TRACER separates decision-making into a controller-regret layer and a generation-credit layer. In the controller-regret layer, controllers use regret matching to decide whether agents should speak or skip a turn. The generation-credit layer optimizes proposer and reviewer utterances using role-specific GSPO rewards. This approach assigns credit at both the action mode and utterance levels. The work was published on arXiv under ID 2605.28699.

Key facts

TRACER is a turn-level reinforcement framework for cooperative multi-LLM reasoning.
It addresses sparse rewards, role-level free-riding, and excessive training overhead.
The framework separates decision-making into a controller-regret layer and a generation-credit layer.
Controllers use regret matching to decide agent turn-taking.
The generation-credit layer uses role-specific GSPO rewards.
TRACER assigns credit at action mode and utterance levels.
The paper is available on arXiv with ID 2605.28699.
The framework aims to combine reinforcement learning and multi-agent prompting.

TRACER: Turn-Level Reinforcement Framework for Multi-LLM Reasoning

Key facts

Entities

Institutions

Sources