PolyChartQA Benchmark Tests Multi-Chart AI Understanding
Researchers have introduced PolyChartQA, a mid-scale dataset designed to evaluate question answering over multi-chart images. The dataset comprises 534 multi-chart images containing 2,297 sub-charts sourced from peer-reviewed computer science publications, along with 2,694 QA pairs. Nine state-of-the-art Multimodal Language Models (MLMs) were tested, revealing a 27.4% drop in LLM-based accuracy on human-authored questions compared to MLM-generated ones. A proposed prompting method yielded a 5.39% accuracy gain. The work addresses the underexplored area of multi-chart understanding in real-world contexts.
Key facts
- PolyChartQA is a mid-scale dataset for question answering over multi-chart images.
- It includes 534 multi-chart images with 2,297 sub-charts from peer-reviewed computer science publications.
- The dataset contains 2,694 QA pairs.
- Nine state-of-the-art Multimodal Language Models (MLMs) were evaluated.
- A 27.4% L-Accuracy drop was observed on human-authored questions vs. MLM-generated questions.
- A proposed prompting method improved L-Accuracy by 5.39%.
- The research highlights the challenge of interpreting multiple related charts together.
- The study is published on arXiv.
Entities
Institutions
- arXiv