ARTFEED — Contemporary Art Intelligence

PolyChartQA Benchmark Tests Multi-Chart AI Understanding

other · 2026-04-25

Researchers have introduced PolyChartQA, a mid-scale dataset designed to evaluate question answering over multi-chart images. The dataset comprises 534 multi-chart images containing 2,297 sub-charts sourced from peer-reviewed computer science publications, along with 2,694 QA pairs. Nine state-of-the-art Multimodal Language Models (MLMs) were tested, revealing a 27.4% drop in LLM-based accuracy on human-authored questions compared to MLM-generated ones. A proposed prompting method yielded a 5.39% accuracy gain. The work addresses the underexplored area of multi-chart understanding in real-world contexts.

Key facts

  • PolyChartQA is a mid-scale dataset for question answering over multi-chart images.
  • It includes 534 multi-chart images with 2,297 sub-charts from peer-reviewed computer science publications.
  • The dataset contains 2,694 QA pairs.
  • Nine state-of-the-art Multimodal Language Models (MLMs) were evaluated.
  • A 27.4% L-Accuracy drop was observed on human-authored questions vs. MLM-generated questions.
  • A proposed prompting method improved L-Accuracy by 5.39%.
  • The research highlights the challenge of interpreting multiple related charts together.
  • The study is published on arXiv.

Entities

Institutions

  • arXiv

Sources