ARTFEED — Contemporary Art Intelligence

Mitigating Cognitive Bias in RLHF by Context-Dependent Rationality

other · 2026-05-11

A new arXiv paper (2605.06895) proposes treating the rationality parameter in reinforcement learning from human feedback (RLHF) as context- and annotation-dependent, rather than a fixed constant, to mitigate cognitive biases in human judgments. The standard Boltzmann model assumes uniform annotator reliability, but real human feedback is shaped by systematic biases that vary contextually. The authors design a method to adjust rationality based on annotation context, aiming to make models robust to imperfect human feedback.

Key facts

  • arXiv paper 2605.06895 proposes context-dependent rationality in RLHF
  • Standard RLHF uses a fixed rationality parameter beta
  • Human feedback is affected by cognitive biases
  • The method treats rationality as context- and annotation-dependent
  • Goal is to make models robust to imperfect human feedback

Entities

Institutions

  • arXiv

Sources