ARTFEED — Contemporary Art Intelligence

MuDABench: New Benchmark for Multi-Document Analytical QA

other · 2026-04-27

A new benchmark called MuDABench has been developed by researchers for multi-document analytical question answering (QA) across extensive, semi-structured document collections. Unlike current multi-document QA benchmarks that rely on a limited number of documents with minimal cross-document reasoning, MuDABench requires comprehensive analysis and synthesis across multiple documents. It features over 80,000 pages and includes 332 analytical QA instances, created through distant supervision utilizing document-level metadata and annotated financial databases. Additionally, the benchmark introduces an evaluation protocol that assesses the accuracy of final answers and incorporates intermediate-fact coverage as an auxiliary diagnostic signal. Experiments indicate that standard RAG systems face challenges with this task. The paper can be found on arXiv.

Key facts

  • MuDABench is a benchmark for multi-document analytical QA
  • It covers over 80,000 pages and 332 QA instances
  • Constructed via distant supervision using metadata and financial databases
  • Requires extensive inter-document analysis and aggregation
  • Evaluation measures answer accuracy and intermediate-fact coverage
  • Standard RAG systems perform poorly on this benchmark
  • Paper published on arXiv with ID 2604.22239

Entities

Institutions

  • arXiv

Sources