MuDABench: New Benchmark for Multi-Document Analytical QA
A new benchmark called MuDABench has been developed by researchers for multi-document analytical question answering (QA) across extensive, semi-structured document collections. Unlike current multi-document QA benchmarks that rely on a limited number of documents with minimal cross-document reasoning, MuDABench requires comprehensive analysis and synthesis across multiple documents. It features over 80,000 pages and includes 332 analytical QA instances, created through distant supervision utilizing document-level metadata and annotated financial databases. Additionally, the benchmark introduces an evaluation protocol that assesses the accuracy of final answers and incorporates intermediate-fact coverage as an auxiliary diagnostic signal. Experiments indicate that standard RAG systems face challenges with this task. The paper can be found on arXiv.
Key facts
- MuDABench is a benchmark for multi-document analytical QA
- It covers over 80,000 pages and 332 QA instances
- Constructed via distant supervision using metadata and financial databases
- Requires extensive inter-document analysis and aggregation
- Evaluation measures answer accuracy and intermediate-fact coverage
- Standard RAG systems perform poorly on this benchmark
- Paper published on arXiv with ID 2604.22239
Entities
Institutions
- arXiv