ATLAS: Multi-LLM Training Framework with Adaptive Reference Evolution

ai-technology · 2026-05-23

A new multi-agent framework named ATLAS (Adaptive Task-distributed Learning for Agentic Self-evolution) has been developed by researchers to address the shortcomings of existing multi-LLM systems that depend on static fine-tuning or frozen agents. Within ATLAS, specialized meta-agents work together to train and enhance an active agent towards a specific domain policy. A significant advancement is the Evolving Direct Preference Optimization (EvoDPO) algorithm, which employs an inspection agent for adaptive updates of reference policies using proxy-KL gating, informed by ongoing training telemetry. This innovation tackles the main issue of iterative preference learning, where fixed reference models can cause conservative updates or stagnation. The framework's effectiveness was assessed on various challenging tasks, with findings published in arXiv preprint 2602.02709.

Key facts

ATLAS is a multi-agent framework for training LLMs.
It uses specialized meta-agents to train an active agent.
EvoDPO enables adaptive reference policy updates.
The framework addresses limitations of static fine-tuning.
EvoDPO uses proxy-KL gated updates based on training telemetry.
The work was published on arXiv with ID 2602.02709.
The approach targets domain-specific policy refinement.
Evaluation was performed on diverse challenging tasks.

ATLAS: Multi-LLM Training Framework with Adaptive Reference Evolution

Key facts

Entities

Institutions

Sources