ARTFEED — Contemporary Art Intelligence

CREDIT: A New Method for On-Policy Self-Distillation in Language Models

other · 2026-05-13

A new paper on arXiv (2605.11613) introduces CREDIT (Contrastive REward from DIsTillation), a method for on-policy self-distillation in language models. The authors analyze the token-level rewards produced by self-distillation, showing they correspond to Bayesian filtering increments whose sum equals pointwise mutual information (pMI) between response and feedback given input. They decompose teacher log-probability along the input axis to distinguish input-specific reasoning from input-generic shortcuts. CREDIT aims to improve credit assignment by using contrastive rewards.

Key facts

  • Paper arXiv:2605.11613
  • Announce type: cross
  • On-policy self-distillation paradigm
  • Token rewards are Bayesian filtering increments
  • Sum equals pointwise mutual information (pMI)
  • pMI can be raised by input-specific reasoning or input-generic shortcuts
  • Proposes CREDIT (Contrastive REward from DIsTillation)
  • Decomposes teacher log-probability along input axis

Entities

Institutions

  • arXiv

Sources