Clinician Overrides as Implicit Preference Signals for Clinical AI
A new arXiv paper proposes reframing clinician overrides of AI recommendations as implicit preference data, similar to RLHF but richer. The authors introduce a five-category override taxonomy, a preference formulation conditioned on patient state, organizational context, and clinician capability, and a dual learning architecture to jointly train reward and capability models. This approach aims to prevent suppression bias, where correct but difficult recommendations are systematically suppressed. The work targets value-based care settings.
Key facts
- Clinician overrides of AI recommendations are reframed as implicit preference data.
- The signal structure is similar to RLHF but richer due to domain expertise and observable outcomes.
- A five-category override taxonomy maps override types to model update targets.
- Preference formulation conditions on patient state s, organizational context c, and clinician capability kappa.
- Kappa decomposes into execution capability kappa-exec and alignment capability kappa-align.
- A dual learning architecture jointly trains a reward model and a capability model via alternating optimization.
- The method prevents suppression bias: systematic suppression of correct-but-difficult recommendations.
- The paper is published on arXiv with ID 2604.28010.
Entities
Institutions
- arXiv