ARTFEED — Contemporary Art Intelligence

OmniToM Benchmark Tests Theory of Mind in LLMs via Belief Modeling

ai-technology · 2026-05-27

OmniToM is a new benchmark introduced in arXiv:2605.26322 that evaluates Theory of Mind (ToM) in large language models (LLMs) by requiring explicit modeling of belief structures for all actors in a narrative. Unlike traditional end-point question answering, which only judges final answers to social reasoning queries, OmniToM directly assesses whether models construct underlying mental-state representations. The benchmark uses belief propositions—minimal statements of what an actor believes about the world or another's mental state—to analyze knowledge, intentions, emotions, and false beliefs in a common format. This addresses the gap in evaluating robust reasoning in scenarios with divergent, evolving, or mistaken beliefs. The research is published on arXiv.

Key facts

  • OmniToM benchmarks Theory of Mind in LLMs via explicit belief modeling
  • Uses belief propositions to represent mental states
  • Evaluates knowledge, intentions, emotions, and false beliefs
  • Addresses limitations of end-point question answering
  • Published on arXiv with ID 2605.26322

Entities

Institutions

  • arXiv

Sources