MirrorBench: A New Benchmark for Self-Centric Intelligence in MLLMs
A new benchmark called MirrorBench has been developed by researchers to assess self-centric intelligence in multimodal large language models (MLLMs) through simulation. Drawing inspiration from the psychological Mirror Self-Recognition (MSR) test, MirrorBench employs a structured framework that includes increasingly complex tasks, ranging from basic visual perception to advanced self-representation. Tests conducted on prominent MLLMs indicate that their performance, even at the most fundamental level, falls significantly short of human capabilities, underscoring critical shortcomings in self-awareness. This benchmark seeks to address a gap in existing evaluations, which predominantly concentrate on interactions with external objects. The findings are available on arXiv with the identifier 2604.14785.
Key facts
- MirrorBench is a simulation-based benchmark for MLLMs.
- It is inspired by the Mirror Self-Recognition (MSR) test in psychology.
- The benchmark uses a tiered framework of progressively challenging tasks.
- Tasks range from basic visual perception to high-level self-representation.
- Experiments show MLLMs perform substantially worse than humans even at the lowest level.
- The benchmark addresses a lack of systematic evaluation of self-centric intelligence.
- Current benchmarks mainly target perception and interaction with external objects.
- The study is published on arXiv with identifier 2604.14785.
Entities
Institutions
- arXiv