ReactBench Benchmark Exposes MLLM Limitations in Topological Reasoning on Chemical Diagrams

ai-technology · 2026-04-20

A new benchmark called ReactBench reveals significant shortcomings in Multimodal Large Language Models' ability to reason about complex topological structures. These AI models struggle with diagrams featuring branching paths, converging flows, and cyclic dependencies, even failing at basic tasks like counting endpoints. Chemical reaction diagrams serve as the testing ground because they naturally encompass diverse structures from linear chains to cyclic graphs. The benchmark comprises 1,618 expert-annotated question-answer pairs organized across four hierarchical task dimensions. Existing evaluation methods have focused primarily on semantic comprehension rather than structural reasoning capabilities. An extensive evaluation across 17 MLLMs demonstrates these fundamental limitations in visual reasoning. The research was announced on arXiv with the identifier 2604.15994v1.

Key facts

ReactBench is a new benchmark for evaluating MLLMs
MLLMs struggle with complex topological structures in diagrams
Chemical reaction diagrams are used as test cases
The benchmark contains 1,618 expert-annotated QA pairs
Evaluation covers four hierarchical task dimensions
17 MLLMs were extensively evaluated
Existing benchmarks focus on semantic comprehension rather than structural reasoning
Research was announced on arXiv with identifier 2604.15994v1

ReactBench Benchmark Exposes MLLM Limitations in Topological Reasoning on Chemical Diagrams

Key facts

Entities

Institutions

Sources