LLMs Show Gaps Between Observations, Beliefs, and Actions in Strategic Play

ai-technology · 2026-05-04

A recent study published on arXiv (2605.00226) indicates that large language models (LLMs) such as Llama 3.1, Qwen3, and gpt-oss face challenges in strategic decision-making within incomplete-information games due to two primary deficiencies. The first is an observation-belief gap: while LLMs possess internal beliefs about hidden game states that are more precise than their verbal expressions, these beliefs are fragile—accuracy diminishes with complex reasoning, exhibits biases related to primacy and recency, and deviates from Bayesian coherence in prolonged interactions. The second is a belief-action gap: the transition from internal beliefs to actions is ineffective, resulting in less-than-optimal choices. These insights clarify the shortcomings of LLMs in areas like negotiation and policymaking.

Key facts

Study published on arXiv with ID 2605.00226
Examines LLM decision-making in incomplete-information games
Identifies observation-belief gap and belief-action gap
Experiments use Llama 3.1, Qwen3, and gpt-oss models
Internal beliefs are more accurate than verbal reports but brittle
Belief accuracy degrades with multi-hop reasoning
Primacy and recency biases affect belief accuracy
Beliefs drift from Bayesian coherence over extended interactions

LLMs Show Gaps Between Observations, Beliefs, and Actions in Strategic Play

Key facts

Entities

Institutions

Sources