CIST: Consistent Soft-Label Temperature for Knowledge Distillation
A new method called CIST (Consistently Informative Soft-label Temperature) addresses limitations in knowledge distillation by assigning separate, sample-adaptive temperatures to teacher and student models. Standard fixed-temperature distillation produces inconsistent entropy in teacher soft labels, with some predictions overly sharp and others over-smoothed. CIST ensures consistently informative soft labels, improving knowledge transfer from high-capacity teacher to compact student.
Key facts
- CIST assigns separate temperatures to teacher and student models.
- Standard fixed-temperature distillation is sample-agnostic.
- CIST addresses inconsistent entropy in teacher soft labels.
- CIST improves knowledge transfer from teacher to student.
- The method is proposed in arXiv:2605.20357.
- Knowledge distillation transfers knowledge via matching predictive distributions.
- Temperature scaling exposes dark knowledge beyond hard labels.
- CIST stands for Consistently Informative Soft-label Temperature.
Entities
—