ARTFEED — Contemporary Art Intelligence

Multi-Teacher On-Policy Distillation for LLM Capability Recovery

ai-technology · 2026-05-27

A new arXiv paper (2605.27115) addresses the challenge of recovering general capabilities in LLMs after domain specialization. Domain specialization improves behavior in vertical domains but often weakens general capabilities. The authors propose a counteraction-aware multi-teacher on-policy distillation method that works with readily available proxy general prompts, avoiding the need to reconstruct hidden training distributions. They identify two failure modes in vanilla MOPD: recovery-preservation counteraction from conflicting gradients, and weak-signal flattening from uniform averaging. The method aims to balance recovery and preservation.

Key facts

  • arXiv paper 2605.27115
  • Addresses general capability recovery after domain specialization
  • Proposes counteraction-aware multi-teacher on-policy distillation
  • Uses proxy general prompts instead of reconstructing hidden distributions
  • Identifies recovery-preservation counteraction and weak-signal flattening as failure modes
  • Focuses on balancing recovery and preservation

Entities

Institutions

  • arXiv

Sources