OmniToM Benchmark Tests Theory of Mind in LLMs via Belief Modeling

ai-technology · 2026-05-27

OmniToM is a new benchmark introduced in arXiv:2605.26322 that evaluates Theory of Mind (ToM) in large language models (LLMs) by requiring explicit modeling of belief structures for all actors in a narrative. Unlike traditional end-point question answering, which only judges final answers to social reasoning queries, OmniToM directly assesses whether models construct underlying mental-state representations. The benchmark uses belief propositions—minimal statements of what an actor believes about the world or another's mental state—to analyze knowledge, intentions, emotions, and false beliefs in a common format. This addresses the gap in evaluating robust reasoning in scenarios with divergent, evolving, or mistaken beliefs. The research is published on arXiv.

Key facts

OmniToM benchmarks Theory of Mind in LLMs via explicit belief modeling
Uses belief propositions to represent mental states
Evaluates knowledge, intentions, emotions, and false beliefs
Addresses limitations of end-point question answering
Published on arXiv with ID 2605.26322

OmniToM Benchmark Tests Theory of Mind in LLMs via Belief Modeling

Key facts

Entities

Institutions

Sources