AVSD: Adaptive-View Self-Distillation for Language Models
Researchers introduce AVSD (Adaptive-View Self-Distillation), a method for language models to learn from multiple types of privileged information. Self-distillation traditionally uses the same model as student and teacher, with the teacher accessing privileged information unavailable to the student. This information can take forms such as solutions, demonstrations, feedback, or final answers. The approach provides dense token-level feedback without external models but creates asymmetry: the teacher relies on view-specific information inaccessible to the student at inference. Additionally, the optimal privileged information type is task-dependent, complicating single-teacher selection. AVSD addresses both challenges by enabling self-distillation with multiple privileged-information views, reconstructing token-level supervision by separating consensus signals from teacher-specific ones. The method adaptively balances these signals to improve student learning. The work is published on arXiv with ID 2605.20643.
Key facts
- AVSD stands for Adaptive-View Self-Distillation.
- It is a method for language model self-distillation with multiple privileged-information views.
- Self-distillation uses the same model as student and teacher.
- Privileged information includes solutions, demonstrations, feedback, or final answers.
- The teacher has access to privileged information unavailable to the student at inference.
- The best type of privileged information is task-dependent.
- AVSD reconstructs token-level supervision by separating consensus and teacher-specific signals.
- The work is announced on arXiv with ID 2605.20643.
Entities
Institutions
- arXiv