Elicitation Protocols Affect Stated-Revealed Preference Gap in AI
A recent investigation published on arXiv explores the influence of elicitation protocols on the stated-revealed (SvR) preference gap in language models (LMs). The SvR gap highlights the discrepancy between the values that LMs express and their contextual choices. Current assessments predominantly utilize binary forced-choice prompts, which blur the line between true preferences and artifacts of the protocol. This study evaluates 24 LMs and discovers that permitting neutrality and abstention in stated preference elicitation enhances Spearman's rank correlation between expressed preferences and those revealed through forced choices. Conversely, allowing abstention in revealed preferences results in a correlation close to zero or negative due to elevated neutrality rates. Additionally, employing system prompt steering with stated preferences during revealed preference elicitation does not consistently enhance SvR correlation.
Key facts
- Study examines stated-revealed preference gap in 24 language models
- SvR gap is mismatch between endorsed values and contextual choices
- Binary forced-choice prompting entangles preferences with protocol artifacts
- Allowing neutrality and abstention in stated preferences improves Spearman's ρ
- Allowing abstention in revealed preferences reduces ρ to near-zero or negative
- System prompt steering does not reliably improve SvR correlation
- Research published on arXiv with ID 2601.21975
- Focus on AI language model behavior
Entities
Institutions
- arXiv