ARTFEED — Contemporary Art Intelligence

Adaptive Teacher Exposure Improves LLM Reasoning Self-Distillation

other · 2026-05-13

A new paper on arXiv (2605.11458) challenges the default practice in on-policy self-distillation for large language model (LLM) reasoning, where a teacher model always sees the full reference reasoning. The authors identify a 'teacher-side exposure mismatch': conditioning on reasoning beyond the student's current competence produces targets too difficult to learn from. A controlled fixed-exposure sweep shows that full exposure is not always optimal and that mismatch grows as the teacher sees more privileged reasoning. They propose Adaptive Teacher Exposure, treating exposure as a learnable training-time variable. The method is evaluated on mathematical reasoning benchmarks, demonstrating improved student performance. The work was submitted on May 26, 2025.

Key facts

  • Paper on arXiv: 2605.11458
  • Submitted May 26, 2025
  • Focuses on on-policy self-distillation for LLM reasoning
  • Identifies teacher-side exposure mismatch
  • Full exposure is not always the best choice
  • Student-teacher mismatch grows with more privileged reasoning
  • Proposes Adaptive Teacher Exposure as a learnable control variable
  • Evaluated on mathematical reasoning benchmarks

Entities

Institutions

  • arXiv

Sources