Belief Graphs Boost LLM Multi-Agent Reasoning in Hanabi
A new preprint on arXiv, identified as 2604.23057, delves into how explicit belief graphs influence large language models (LLMs) during cooperative multi-agent tasks. This study, which included over 3,000 controlled tests across four LLM families using the game Hanabi, found that the way these graphs are integrated is key. For stronger models, graphs seem to add little value when just used as prompts, but they significantly help weaker models in 2nd-order Theory of Mind (80% vs 10%, p<0.0001, OR=36.0). However, when used for action selection through ranked lists, they become essential even for stronger models (100% vs 20% on 2nd-order ToM, p<0.001). The study also notes "Planner Defiance," where some models ignore planner suggestions, with Llama 70B showing a 90% override compared to Gemini's minimal defiance.
Key facts
- arXiv:2604.23057v1
- 3,000+ controlled trials
- Four LLM families tested
- Hanabi cooperative card game
- Graphs as prompt context: decorative for strong models, beneficial for weak models on 2nd-order ToM (80% vs 10%, p<0.0001, OR=36.0)
- Graphs gating action selection: structurally essential for strong models (100% vs 20% on 2nd-order ToM, p<0.001)
- Planner Defiance: 90% override rate, replicated N=20
- Gemini models show near-zero defiance, Llama 70B shows 90%
Entities
Institutions
- arXiv