InterChart Benchmark Tests VLMs on Multi-Chart Reasoning
InterChart has been launched by researchers as a diagnostic standard aimed at assessing the reasoning capabilities of vision-language models (VLMs) across interconnected charts. This evaluation is crucial for practical uses in areas like scientific reporting, financial analysis, and public policy dashboards. Unlike previous benchmarks that examined isolated, visually similar charts, InterChart presents a variety of question types, including entity inference, trend correlation, numerical estimation, and complex multi-step reasoning based on 2-3 related charts. The benchmark consists of three levels of difficulty: factual reasoning with single charts, integrative analysis of aligned chart sets, and semantic inference using visually intricate, real-world chart pairs. Evaluations show that both open- and closed-source VLMs experience significant accuracy drops as chart complexity rises.
Key facts
- InterChart is a diagnostic benchmark for vision-language models.
- It evaluates reasoning across multiple related charts.
- Tasks include entity inference, trend correlation, numerical estimation, and multi-step reasoning.
- The benchmark has three tiers of increasing difficulty.
- Tiers cover individual charts, synthetic sets, and real-world pairs.
- State-of-the-art VLMs show steep accuracy declines with complexity.
- InterChart targets applications in scientific reporting, financial analysis, and policy dashboards.
- The benchmark was introduced in arXiv:2508.07630v2.
Entities
Institutions
- arXiv