MirrorBench: A New Benchmark for Self-Centric Intelligence in MLLMs

ai-technology · 2026-04-24

A new benchmark called MirrorBench has been developed by researchers to assess self-centric intelligence in multimodal large language models (MLLMs) through simulation. Drawing inspiration from the psychological Mirror Self-Recognition (MSR) test, MirrorBench employs a structured framework that includes increasingly complex tasks, ranging from basic visual perception to advanced self-representation. Tests conducted on prominent MLLMs indicate that their performance, even at the most fundamental level, falls significantly short of human capabilities, underscoring critical shortcomings in self-awareness. This benchmark seeks to address a gap in existing evaluations, which predominantly concentrate on interactions with external objects. The findings are available on arXiv with the identifier 2604.14785.

Key facts

MirrorBench is a simulation-based benchmark for MLLMs.
It is inspired by the Mirror Self-Recognition (MSR) test in psychology.
The benchmark uses a tiered framework of progressively challenging tasks.
Tasks range from basic visual perception to high-level self-representation.
Experiments show MLLMs perform substantially worse than humans even at the lowest level.
The benchmark addresses a lack of systematic evaluation of self-centric intelligence.
Current benchmarks mainly target perception and interaction with external objects.
The study is published on arXiv with identifier 2604.14785.

MirrorBench: A New Benchmark for Self-Centric Intelligence in MLLMs

Key facts

Entities

Institutions

Sources