OSCToM: RL-Based Method for High-Order Theory of Mind in LLMs

ai-technology · 2026-05-22

A new paper on arXiv introduces OSCToM (Observer-Self Conflict Theory of Mind), a method for generating adversarial examples that test high-order Theory of Mind reasoning in Large Language Models. The approach focuses on nested belief conflicts where an observer's view of another agent contradicts the observer's own belief state, requiring recursive multi-layered reasoning beyond simple perspective-taking. OSCToM combines reinforcement learning, an extended domain-specific language, and compositional surrogate models to produce these conflicts. In experiments, OSCToM-8B achieved the best overall performance among tested systems, improving on ExploreToM results on the FANToM benchmark. The paper is available at https://arxiv.org/abs/2605.20423.

Key facts

OSCToM stands for Observer-Self Conflict Theory of Mind.
The method uses reinforcement learning to generate adversarial examples.
It targets recursive beliefs and information asymmetries in social reasoning.
OSCToM-8B outperformed other systems on FANToM benchmark.
The approach extends a domain-specific language for compositional modeling.
Existing benchmarks like ExploreToM do not fully test nested belief conflicts.
The paper was published on arXiv with ID 2605.20423.
The method addresses observer-self belief conflicts in LLMs.

OSCToM: RL-Based Method for High-Order Theory of Mind in LLMs

Key facts

Entities

Institutions

Sources