MAD-OPD: Multi-Agent Debate Breaks Teacher Ceiling in On-Policy Distillation

other · 2026-05-06

Researchers have introduced MAD-OPD (Multi-Agent Debate-driven On-Policy Distillation), a technique that addresses the limitations of a single-teacher approach in on-policy distillation. This method utilizes a group of teachers who engage in discussions regarding the student's on-policy state, leading to a collective intelligence that provides token-level guidance, with each teacher's input adjusted based on their confidence after the debate. To apply OPD to agentic tasks, the authors present On-Policy Agentic Distillation (OPAD), which incorporates step-level sampling to enhance training stability amid multi-step error accumulation. This research is available on arXiv (2605.01347).

Key facts

MAD-OPD uses multi-agent debate to break the single-teacher ceiling in on-policy distillation.
Teachers debate over the student's on-policy state to produce emergent collective intelligence.
Each teacher's contribution is weighted by its post-debate confidence.
OPAD adds step-level sampling to stabilize training for agentic tasks.
The paper is available on arXiv with ID 2605.01347.
On-policy distillation trains a student on its own trajectories under token-level teacher supervision.
Existing OPD methods are capped by a single-teacher capability ceiling.
OPD was largely unexplored in agentic tasks before this work.

MAD-OPD: Multi-Agent Debate Breaks Teacher Ceiling in On-Policy Distillation

Key facts

Entities

Institutions

Sources