Re$^2$Math: New Benchmark for Theorem Retrieval in Mathematical Proofs

other · 2026-05-12

Researchers have introduced Re$^2$Math, a benchmark designed to evaluate large language models' ability to retrieve relevant theorems from mathematical literature during proof construction. Each test instance is derived from a citation within a proof of a main theorem, providing hierarchical context and an optional anchor hint. The task is source-grounded yet citation-agnostic, accepting any admissible theorem sufficient for the proof transition. The benchmark uses a release-frozen retrieval artifact to ensure reproducibility. This work addresses the need for AI assistants that can determine whether a needed lemma exists, identify suitable scholarly sources, and verify assumption alignment with the current proof context. The paper is available on arXiv.

Key facts

Re$^2$Math is a benchmark for tool-grounded retrieval from partial mathematical proofs.
Each instance is built from a candidate instrumental citation in the proof of a main theorem.
Hierarchical context and an optional leakage-controlled anchor hint are provided.
The task is source-grounded yet citation-agnostic.
Any admissible theorem sufficient for the proof transition is accepted.
Evaluation uses a release-frozen retrieval artifact for reproducibility.
The benchmark targets large language models' capabilities in research-level mathematics.
The paper is published on arXiv with ID 2605.09012.

Re$^2$Math: New Benchmark for Theorem Retrieval in Mathematical Proofs

Key facts

Entities

Institutions

Sources