MathAtlas: Graduate-Level Math Autoformalization Benchmark
MathAtlas has been unveiled by researchers as the inaugural extensive benchmark for the autoformalization of advanced mathematics at the graduate level. This resource comprises approximately 52,000 theorems, definitions, exercises, examples, and proofs sourced from 103 graduate textbooks, complemented by a dependency graph featuring around 178,000 relations. Current strong baseline models demonstrate a maximum correctness of only 9.8% for theorem statements and 16.7% for definitions, underscoring the significant challenges in this field.
Key facts
- MathAtlas is the first large-scale autoformalization benchmark for graduate-level mathematics.
- It contains approximately 52,000 theorems, definitions, exercises, examples, and proofs.
- The benchmark is extracted from 103 graduate mathematics textbooks.
- It includes a mathematical dependency graph with about 178,000 relations.
- Strong baselines achieve at most 9.8% correctness on theorem statements.
- Strong baselines achieve at most 16.7% correctness on definitions.
- The benchmark is designed to facilitate evaluation and development of dependency-aware autoformalization systems.
- State-of-the-art model performance degrades substantially with dependency.
Entities
—