ARTFEED — Contemporary Art Intelligence

TRACER: Turn-Level Reinforcement Framework for Multi-LLM Reasoning

ai-technology · 2026-05-28

Researchers introduced TRACER, a turn-level reinforcement framework designed to improve cooperative reasoning among multiple large language models. The framework addresses challenges in multi-agent systems such as sparse rewards, role-level free-riding, excessive training overhead, imitation-only collaboration, and oscillating local optima. TRACER separates decision-making into a controller-regret layer and a generation-credit layer. In the controller-regret layer, controllers use regret matching to decide whether agents should speak or skip a turn. The generation-credit layer optimizes proposer and reviewer utterances using role-specific GSPO rewards. This approach assigns credit at both the action mode and utterance levels. The work was published on arXiv under ID 2605.28699.

Key facts

  • TRACER is a turn-level reinforcement framework for cooperative multi-LLM reasoning.
  • It addresses sparse rewards, role-level free-riding, and excessive training overhead.
  • The framework separates decision-making into a controller-regret layer and a generation-credit layer.
  • Controllers use regret matching to decide agent turn-taking.
  • The generation-credit layer uses role-specific GSPO rewards.
  • TRACER assigns credit at action mode and utterance levels.
  • The paper is available on arXiv with ID 2605.28699.
  • The framework aims to combine reinforcement learning and multi-agent prompting.

Entities

Institutions

  • arXiv

Sources