On-Policy Distillation Method for LLMs Introduced

publication · 2026-05-25

A new arXiv paper (2605.23493) presents EDGE-OPD, an on-policy distillation method for large language models. On-Policy Distillation (OPD) improves LLM capabilities without distribution drift. On-Policy Self-Distillation (OPSD) uses a single model as student and teacher, providing privileged context absent at inference time. However, privileged information can cause unintended behavioral changes. EDGE-OPD addresses this by internalizing privileged context with evidence-guided techniques.

Key facts

Paper ID: arXiv:2605.23493
Method: EDGE-OPD
Focus: On-Policy Distillation for LLMs
OPSD uses single model as student and teacher
Privileged context includes persona, private fact, worked solution
Challenge: privileged info can modify reasoning and degrade capabilities
Goal: train on desired behavior, not side effects
Published on arXiv

On-Policy Distillation Method for LLMs Introduced

Key facts

Entities

Institutions

Sources