ARTFEED — Contemporary Art Intelligence

Re$^2$Math: New Benchmark for Theorem Retrieval in Mathematical Proofs

other · 2026-05-12

Researchers have introduced Re$^2$Math, a benchmark designed to evaluate large language models' ability to retrieve relevant theorems from mathematical literature during proof construction. Each test instance is derived from a citation within a proof of a main theorem, providing hierarchical context and an optional anchor hint. The task is source-grounded yet citation-agnostic, accepting any admissible theorem sufficient for the proof transition. The benchmark uses a release-frozen retrieval artifact to ensure reproducibility. This work addresses the need for AI assistants that can determine whether a needed lemma exists, identify suitable scholarly sources, and verify assumption alignment with the current proof context. The paper is available on arXiv.

Key facts

  • Re$^2$Math is a benchmark for tool-grounded retrieval from partial mathematical proofs.
  • Each instance is built from a candidate instrumental citation in the proof of a main theorem.
  • Hierarchical context and an optional leakage-controlled anchor hint are provided.
  • The task is source-grounded yet citation-agnostic.
  • Any admissible theorem sufficient for the proof transition is accepted.
  • Evaluation uses a release-frozen retrieval artifact for reproducibility.
  • The benchmark targets large language models' capabilities in research-level mathematics.
  • The paper is published on arXiv with ID 2605.09012.

Entities

Institutions

  • arXiv

Sources