New AI Method Distills Multi-Agent Debate into Single LLM

ai-technology · 2026-04-30

A new approach called Latent Agents has been developed by researchers, which allows for the integration of multi-agent debate into a single large language model (LLM) after training. This method employs a two-step fine-tuning process that merges learning debate structures with internalization, utilizing dynamic reward scheduling and length clipping. The results indicate that these internalized models perform on par with or better than traditional multi-agent debate methods, achieving this with up to 93% fewer tokens. Furthermore, the research explores the underlying mechanisms of this ability through activation steering, revealing that internalization generates agent-specific subspaces—distinct directions in activation space that reflect various agent viewpoints. A practical example includes the incorporation of malicious agents into the LLM via internalized debate.

Key facts

Latent Agents is a post-training procedure for internalized multi-agent debate
It uses a two-stage fine-tuning pipeline with debate structure learning and internalization
Dynamic reward scheduling and length clipping are employed
Internalized models match or exceed explicit multi-agent debate performance
Up to 93% fewer tokens are used compared to explicit debate
Activation steering reveals agent-specific subspaces in activation space
A practical application involves instilling malicious agents into the LLM
The research is published on arXiv with ID 2604.24881

New AI Method Distills Multi-Agent Debate into Single LLM

Key facts

Entities

Institutions

Sources