ARTFEED — Contemporary Art Intelligence

SciMDR Dataset Boosts Multimodal Scientific Document Reasoning

ai-technology · 2026-04-30

A new training dataset, SciMDR, has been launched by researchers to enhance cross-modal understanding in scientific literature, featuring 300K question-and-answer pairs with explicit reasoning chains derived from 20K scientific papers. This dataset was developed through an innovative synthesize-and-reground approach that balances scale, fidelity, and realism. The process consists of two stages: Claim-Centric QA Synthesis, which produces accurate, isolated QA pairs and reasoning for specific segments, and Document-Scale Regrounding, which programmatically integrates these pairs into comprehensive document tasks to reflect realistic complexity. Furthermore, the team created SciMDR-Eval, a benchmark with expert annotations for assessing multimodal comprehension in complete scientific workflows. Experiments indicate that models refined using SciMDR show notable advancements in scientific multimodal document reasoning.

Key facts

  • SciMDR is a large-scale training dataset for cross-modal comprehension.
  • It contains 300K QA pairs with explicit reasoning chains.
  • The dataset spans 20K scientific papers.
  • Constructed using a synthesize-and-reground framework.
  • The framework includes Claim-Centric QA Synthesis and Document-Scale Regrounding.
  • SciMDR-Eval is an expert-annotated benchmark for evaluation.
  • Models fine-tuned on SciMDR show significant improvements.
  • The research is published on arXiv with ID 2603.12249.

Entities

Institutions

  • arXiv

Sources