CREDIT: A New Method for On-Policy Self-Distillation in Language Models
A new paper on arXiv (2605.11613) introduces CREDIT (Contrastive REward from DIsTillation), a method for on-policy self-distillation in language models. The authors analyze the token-level rewards produced by self-distillation, showing they correspond to Bayesian filtering increments whose sum equals pointwise mutual information (pMI) between response and feedback given input. They decompose teacher log-probability along the input axis to distinguish input-specific reasoning from input-generic shortcuts. CREDIT aims to improve credit assignment by using contrastive rewards.
Key facts
- Paper arXiv:2605.11613
- Announce type: cross
- On-policy self-distillation paradigm
- Token rewards are Bayesian filtering increments
- Sum equals pointwise mutual information (pMI)
- pMI can be raised by input-specific reasoning or input-generic shortcuts
- Proposes CREDIT (Contrastive REward from DIsTillation)
- Decomposes teacher log-probability along input axis
Entities
Institutions
- arXiv