Re$^2$Math: New Benchmark for Theorem Retrieval in Mathematical Proofs
Researchers have introduced Re$^2$Math, a benchmark designed to evaluate large language models' ability to retrieve relevant theorems from mathematical literature during proof construction. Each test instance is derived from a citation within a proof of a main theorem, providing hierarchical context and an optional anchor hint. The task is source-grounded yet citation-agnostic, accepting any admissible theorem sufficient for the proof transition. The benchmark uses a release-frozen retrieval artifact to ensure reproducibility. This work addresses the need for AI assistants that can determine whether a needed lemma exists, identify suitable scholarly sources, and verify assumption alignment with the current proof context. The paper is available on arXiv.
Key facts
- Re$^2$Math is a benchmark for tool-grounded retrieval from partial mathematical proofs.
- Each instance is built from a candidate instrumental citation in the proof of a main theorem.
- Hierarchical context and an optional leakage-controlled anchor hint are provided.
- The task is source-grounded yet citation-agnostic.
- Any admissible theorem sufficient for the proof transition is accepted.
- Evaluation uses a release-frozen retrieval artifact for reproducibility.
- The benchmark targets large language models' capabilities in research-level mathematics.
- The paper is published on arXiv with ID 2605.09012.
Entities
Institutions
- arXiv