Multi-Teacher On-Policy Distillation for LLM Capability Recovery

ai-technology · 2026-05-27

A new arXiv paper (2605.27115) addresses the challenge of recovering general capabilities in LLMs after domain specialization. Domain specialization improves behavior in vertical domains but often weakens general capabilities. The authors propose a counteraction-aware multi-teacher on-policy distillation method that works with readily available proxy general prompts, avoiding the need to reconstruct hidden training distributions. They identify two failure modes in vanilla MOPD: recovery-preservation counteraction from conflicting gradients, and weak-signal flattening from uniform averaging. The method aims to balance recovery and preservation.

Key facts

arXiv paper 2605.27115
Addresses general capability recovery after domain specialization
Proposes counteraction-aware multi-teacher on-policy distillation
Uses proxy general prompts instead of reconstructing hidden distributions
Identifies recovery-preservation counteraction and weak-signal flattening as failure modes
Focuses on balancing recovery and preservation

Multi-Teacher On-Policy Distillation for LLM Capability Recovery

Key facts

Entities

Institutions

Sources