On-Policy Distillation Method for LLMs Introduced
A new arXiv paper (2605.23493) presents EDGE-OPD, an on-policy distillation method for large language models. On-Policy Distillation (OPD) improves LLM capabilities without distribution drift. On-Policy Self-Distillation (OPSD) uses a single model as student and teacher, providing privileged context absent at inference time. However, privileged information can cause unintended behavioral changes. EDGE-OPD addresses this by internalizing privileged context with evidence-guided techniques.
Key facts
- Paper ID: arXiv:2605.23493
- Method: EDGE-OPD
- Focus: On-Policy Distillation for LLMs
- OPSD uses single model as student and teacher
- Privileged context includes persona, private fact, worked solution
- Challenge: privileged info can modify reasoning and degrade capabilities
- Goal: train on desired behavior, not side effects
- Published on arXiv
Entities
Institutions
- arXiv