Annotator Safety Policy Disagreement Sources Analyzed
A new arXiv paper (2605.05329) introduces a method to distinguish sources of annotation disagreement in AI safety policy. Disagreement can stem from operational failures, policy ambiguity, or value pluralism. Directly asking annotators for reasoning is costly and unreliable. The study proposes an approach to identify the root cause without increasing annotation burden.
Key facts
- arXiv paper 2605.05329
- Announce Type: new
- Disagreement sources: operational failures, policy ambiguity, value pluralism
- Direct reasoning elicitation is costly and unreliable
- Proposes method to distinguish sources without extra annotation burden
Entities
Institutions
- arXiv