ARTFEED — Contemporary Art Intelligence

Calibration Framework for Probabilistic Label Ranking

other · 2026-06-01

A new study formalizes calibration for probabilistic label ranking, a task where models predict distributions over orderings of a label set. The authors define a hierarchy of calibration notions covering full rankings, sub-rankings, and top-k rankings, proving that full-rank calibration implies the others but not vice versa, and that sub-ranking and top-k calibration are incomparable. Empirical tests show popular label ranking models are often poorly calibrated, with significant discrepancies between sub-ranking and top-k metrics. The framework is applied to RLHF reward models, revealing calibration issues in preference learning.

Key facts

  • Calibration aligns predicted probabilities with true outcome frequencies.
  • Label ranking predicts a distribution over orderings of a label set.
  • Full-rank calibration implies sub-ranking and top-k calibration.
  • Sub-ranking and top-k calibration are incomparable.
  • Popular label ranking models are often poorly calibrated.
  • Substantial differences exist between sub-ranking and top-k metrics.
  • The framework is applied to RLHF reward models.
  • The study is published on arXiv with ID 2605.30447.

Entities

Sources