LLMs Fail to Reproduce Human Realization Effect in Risk-Taking
A recent study published on arXiv (2605.25151) investigates whether large language models (LLMs) demonstrate the realization effect, a concept from behavioral economics indicating that risk preferences vary based on paper versus actual gains and losses. The researchers analyzed LLM behavior through three approaches: sensitivity to prompts alone, linear decoding of internal representations, and causal manipulation via activation steering. While prompt-only analysis revealed consistent sensitivity to conditions, the observed directional trends did not align with human expectations. Notably, a realization-status signal was identified in layer 18 of Gemma's residual stream, which generalized to unseen prompts. However, steering this signal did not consistently alter downstream risk decisions, indicating that LLMs may not authentically mimic human cognitive processes in this area.
Key facts
- Study tests realization effect in LLMs
- Three evaluation levels: prompt-only, linear readout, activation steering
- Prompt-only results show condition sensitivity but wrong direction
- Gemma's residual stream has realization-status signal at layer 18
- Signal generalizes to held-out prompts
- Activation steering does not reliably shift risk choices
- Null result holds across conditions
- Paper on arXiv: 2605.25151
Entities
Institutions
- arXiv