DRAGON Benchmark Tests Visual Grounding in Diagram QA

ai-technology · 2026-04-30

Researchers have launched DRAGON, a new benchmark designed to assess evidence-based visual reasoning within diagrams. Diagram question answering (DQA) involves models interpreting organized visual formats like charts, maps, infographics, circuit schematics, and scientific diagrams. Although recent vision-language models (VLMs) often demonstrate high accuracy in answering these questions, achieving the correct answer does not ensure that the models utilize the diagram regions that substantiate their predictions. Instead, they may depend on textual links or dataset artifacts, bypassing the visual evidence necessary for answer verification. This shortcoming hinders effective evaluation of diagram reasoning and diminishes interpretability. DRAGON mitigates this by requiring models to identify bounding boxes for visual elements that support the answer, including components that contain answers and relevant text. The benchmark strives for a more thorough evaluation of AI systems' diagram comprehension.

Key facts

DRAGON is a benchmark for evidence-grounded visual reasoning in diagrams.
It evaluates models on predicting bounding boxes for visual evidence supporting answers.
Diagram question answering involves structured visuals like charts, maps, and schematics.
Current VLMs often achieve high accuracy without proper visual grounding.
Models may rely on textual correlations or dataset artifacts.
The benchmark requires models to identify visual elements that justify the answer.
Evidence regions can include answer-bearing components and textual parts.
DRAGON aims to improve reliability and interpretability of diagram reasoning.

Entities

—

Sources

arXiv cs.AI — 2026-04-29