CREDIT: A New Method for On-Policy Self-Distillation in Language Models

other · 2026-05-13

A new paper on arXiv (2605.11613) introduces CREDIT (Contrastive REward from DIsTillation), a method for on-policy self-distillation in language models. The authors analyze the token-level rewards produced by self-distillation, showing they correspond to Bayesian filtering increments whose sum equals pointwise mutual information (pMI) between response and feedback given input. They decompose teacher log-probability along the input axis to distinguish input-specific reasoning from input-generic shortcuts. CREDIT aims to improve credit assignment by using contrastive rewards.

Key facts

Paper arXiv:2605.11613
Announce type: cross
On-policy self-distillation paradigm
Token rewards are Bayesian filtering increments
Sum equals pointwise mutual information (pMI)
pMI can be raised by input-specific reasoning or input-generic shortcuts
Proposes CREDIT (Contrastive REward from DIsTillation)
Decomposes teacher log-probability along input axis

CREDIT: A New Method for On-Policy Self-Distillation in Language Models

Key facts

Entities

Institutions

Sources