ARTFEED — Contemporary Art Intelligence

ReactBench Benchmark Exposes MLLM Limitations in Topological Reasoning on Chemical Diagrams

ai-technology · 2026-04-20

A new benchmark called ReactBench reveals significant shortcomings in Multimodal Large Language Models' ability to reason about complex topological structures. These AI models struggle with diagrams featuring branching paths, converging flows, and cyclic dependencies, even failing at basic tasks like counting endpoints. Chemical reaction diagrams serve as the testing ground because they naturally encompass diverse structures from linear chains to cyclic graphs. The benchmark comprises 1,618 expert-annotated question-answer pairs organized across four hierarchical task dimensions. Existing evaluation methods have focused primarily on semantic comprehension rather than structural reasoning capabilities. An extensive evaluation across 17 MLLMs demonstrates these fundamental limitations in visual reasoning. The research was announced on arXiv with the identifier 2604.15994v1.

Key facts

  • ReactBench is a new benchmark for evaluating MLLMs
  • MLLMs struggle with complex topological structures in diagrams
  • Chemical reaction diagrams are used as test cases
  • The benchmark contains 1,618 expert-annotated QA pairs
  • Evaluation covers four hierarchical task dimensions
  • 17 MLLMs were extensively evaluated
  • Existing benchmarks focus on semantic comprehension rather than structural reasoning
  • Research was announced on arXiv with identifier 2604.15994v1

Entities

Institutions

  • arXiv

Sources