OmniToM Benchmark Tests Theory of Mind in LLMs via Belief Modeling
OmniToM is a new benchmark introduced in arXiv:2605.26322 that evaluates Theory of Mind (ToM) in large language models (LLMs) by requiring explicit modeling of belief structures for all actors in a narrative. Unlike traditional end-point question answering, which only judges final answers to social reasoning queries, OmniToM directly assesses whether models construct underlying mental-state representations. The benchmark uses belief propositions—minimal statements of what an actor believes about the world or another's mental state—to analyze knowledge, intentions, emotions, and false beliefs in a common format. This addresses the gap in evaluating robust reasoning in scenarios with divergent, evolving, or mistaken beliefs. The research is published on arXiv.
Key facts
- OmniToM benchmarks Theory of Mind in LLMs via explicit belief modeling
- Uses belief propositions to represent mental states
- Evaluates knowledge, intentions, emotions, and false beliefs
- Addresses limitations of end-point question answering
- Published on arXiv with ID 2605.26322
Entities
Institutions
- arXiv