PolyChartQA Benchmark Tests Multi-Chart AI Understanding

other · 2026-04-25

Researchers have introduced PolyChartQA, a mid-scale dataset designed to evaluate question answering over multi-chart images. The dataset comprises 534 multi-chart images containing 2,297 sub-charts sourced from peer-reviewed computer science publications, along with 2,694 QA pairs. Nine state-of-the-art Multimodal Language Models (MLMs) were tested, revealing a 27.4% drop in LLM-based accuracy on human-authored questions compared to MLM-generated ones. A proposed prompting method yielded a 5.39% accuracy gain. The work addresses the underexplored area of multi-chart understanding in real-world contexts.

Key facts

PolyChartQA is a mid-scale dataset for question answering over multi-chart images.
It includes 534 multi-chart images with 2,297 sub-charts from peer-reviewed computer science publications.
The dataset contains 2,694 QA pairs.
Nine state-of-the-art Multimodal Language Models (MLMs) were evaluated.
A 27.4% L-Accuracy drop was observed on human-authored questions vs. MLM-generated questions.
A proposed prompting method improved L-Accuracy by 5.39%.
The research highlights the challenge of interpreting multiple related charts together.
The study is published on arXiv.

PolyChartQA Benchmark Tests Multi-Chart AI Understanding

Key facts

Entities

Institutions

Sources