MPDocBench-Parse: New Benchmark for Multi-Page Document Parsing
MPDocBench-Parse has been launched by researchers as a benchmark aimed at assessing multi-page document parsing in real-world contexts. This new benchmark tackles the shortcomings of current assessments that concentrate solely on single-page or text-focused environments. It features 433 documents, manually annotated, encompassing 3,246 pages across 15 types of documents in English and Chinese, showcasing various layout designs. Additionally, it facilitates document-level end-to-end evaluation and offers a thorough protocol for recovering content fidelity and logical structure. This initiative seeks to enhance document parsing by establishing a more applicable evaluation framework.
Key facts
- MPDocBench-Parse is a benchmark for multi-page document parsing.
- It contains 433 manually annotated documents with 3,246 pages.
- Covers 15 document types in English and Chinese.
- Supports document-level end-to-end evaluation.
- Designed for realistic, practical scenarios.
- Addresses gaps in existing benchmarks that focus on single-page or text-centric settings.
- Includes a comprehensive protocol for content fidelity and logical structure recovery.
- Published on arXiv with ID 2605.22100.
Entities
—