MuDABench: New Benchmark for Multi-Document Analytical QA

other · 2026-04-27

A new benchmark called MuDABench has been developed by researchers for multi-document analytical question answering (QA) across extensive, semi-structured document collections. Unlike current multi-document QA benchmarks that rely on a limited number of documents with minimal cross-document reasoning, MuDABench requires comprehensive analysis and synthesis across multiple documents. It features over 80,000 pages and includes 332 analytical QA instances, created through distant supervision utilizing document-level metadata and annotated financial databases. Additionally, the benchmark introduces an evaluation protocol that assesses the accuracy of final answers and incorporates intermediate-fact coverage as an auxiliary diagnostic signal. Experiments indicate that standard RAG systems face challenges with this task. The paper can be found on arXiv.

Key facts

MuDABench is a benchmark for multi-document analytical QA
It covers over 80,000 pages and 332 QA instances
Constructed via distant supervision using metadata and financial databases
Requires extensive inter-document analysis and aggregation
Evaluation measures answer accuracy and intermediate-fact coverage
Standard RAG systems perform poorly on this benchmark
Paper published on arXiv with ID 2604.22239

MuDABench: New Benchmark for Multi-Document Analytical QA

Key facts

Entities

Institutions

Sources