ARTFEED — Contemporary Art Intelligence

MPDocBench-Parse: New Benchmark for Multi-Page Document Parsing

other · 2026-05-23

MPDocBench-Parse has been launched by researchers as a benchmark aimed at assessing multi-page document parsing in real-world contexts. This new benchmark tackles the shortcomings of current assessments that concentrate solely on single-page or text-focused environments. It features 433 documents, manually annotated, encompassing 3,246 pages across 15 types of documents in English and Chinese, showcasing various layout designs. Additionally, it facilitates document-level end-to-end evaluation and offers a thorough protocol for recovering content fidelity and logical structure. This initiative seeks to enhance document parsing by establishing a more applicable evaluation framework.

Key facts

  • MPDocBench-Parse is a benchmark for multi-page document parsing.
  • It contains 433 manually annotated documents with 3,246 pages.
  • Covers 15 document types in English and Chinese.
  • Supports document-level end-to-end evaluation.
  • Designed for realistic, practical scenarios.
  • Addresses gaps in existing benchmarks that focus on single-page or text-centric settings.
  • Includes a comprehensive protocol for content fidelity and logical structure recovery.
  • Published on arXiv with ID 2605.22100.

Entities

Sources