LLMs Show Gaps Between Observations, Beliefs, and Actions in Strategic Play
A recent study published on arXiv (2605.00226) indicates that large language models (LLMs) such as Llama 3.1, Qwen3, and gpt-oss face challenges in strategic decision-making within incomplete-information games due to two primary deficiencies. The first is an observation-belief gap: while LLMs possess internal beliefs about hidden game states that are more precise than their verbal expressions, these beliefs are fragile—accuracy diminishes with complex reasoning, exhibits biases related to primacy and recency, and deviates from Bayesian coherence in prolonged interactions. The second is a belief-action gap: the transition from internal beliefs to actions is ineffective, resulting in less-than-optimal choices. These insights clarify the shortcomings of LLMs in areas like negotiation and policymaking.
Key facts
- Study published on arXiv with ID 2605.00226
- Examines LLM decision-making in incomplete-information games
- Identifies observation-belief gap and belief-action gap
- Experiments use Llama 3.1, Qwen3, and gpt-oss models
- Internal beliefs are more accurate than verbal reports but brittle
- Belief accuracy degrades with multi-hop reasoning
- Primacy and recency biases affect belief accuracy
- Beliefs drift from Bayesian coherence over extended interactions
Entities
Institutions
- arXiv