ARTFEED — Contemporary Art Intelligence

MAD-OPD: Multi-Agent Debate Breaks Teacher Ceiling in On-Policy Distillation

other · 2026-05-06

Researchers have introduced MAD-OPD (Multi-Agent Debate-driven On-Policy Distillation), a technique that addresses the limitations of a single-teacher approach in on-policy distillation. This method utilizes a group of teachers who engage in discussions regarding the student's on-policy state, leading to a collective intelligence that provides token-level guidance, with each teacher's input adjusted based on their confidence after the debate. To apply OPD to agentic tasks, the authors present On-Policy Agentic Distillation (OPAD), which incorporates step-level sampling to enhance training stability amid multi-step error accumulation. This research is available on arXiv (2605.01347).

Key facts

  • MAD-OPD uses multi-agent debate to break the single-teacher ceiling in on-policy distillation.
  • Teachers debate over the student's on-policy state to produce emergent collective intelligence.
  • Each teacher's contribution is weighted by its post-debate confidence.
  • OPAD adds step-level sampling to stabilize training for agentic tasks.
  • The paper is available on arXiv with ID 2605.01347.
  • On-policy distillation trains a student on its own trajectories under token-level teacher supervision.
  • Existing OPD methods are capped by a single-teacher capability ceiling.
  • OPD was largely unexplored in agentic tasks before this work.

Entities

Institutions

  • arXiv

Sources