New AI Method Distills Multi-Agent Debate into Single LLM
A new approach called Latent Agents has been developed by researchers, which allows for the integration of multi-agent debate into a single large language model (LLM) after training. This method employs a two-step fine-tuning process that merges learning debate structures with internalization, utilizing dynamic reward scheduling and length clipping. The results indicate that these internalized models perform on par with or better than traditional multi-agent debate methods, achieving this with up to 93% fewer tokens. Furthermore, the research explores the underlying mechanisms of this ability through activation steering, revealing that internalization generates agent-specific subspaces—distinct directions in activation space that reflect various agent viewpoints. A practical example includes the incorporation of malicious agents into the LLM via internalized debate.
Key facts
- Latent Agents is a post-training procedure for internalized multi-agent debate
- It uses a two-stage fine-tuning pipeline with debate structure learning and internalization
- Dynamic reward scheduling and length clipping are employed
- Internalized models match or exceed explicit multi-agent debate performance
- Up to 93% fewer tokens are used compared to explicit debate
- Activation steering reveals agent-specific subspaces in activation space
- A practical application involves instilling malicious agents into the LLM
- The research is published on arXiv with ID 2604.24881
Entities
Institutions
- arXiv