New AI Framework DAP Advances Automated Theorem Proving in Hard Mode Setting

ai-technology · 2026-04-20

Researchers have introduced Discover And Prove (DAP), an agentic framework that uses large language models for natural-language reasoning with explicit self-reflection to discover answers before constructing formal proofs. This addresses what the authors term "Hard Mode"—a stricter, more realistic setting where systems must independently discover answers rather than having them embedded in formal statements. The framework rewrites Hard Mode statements into "Easy Mode" ones for existing automated theorem provers. DAP sets new state-of-the-art performance: on CombiBench it raises solved problems from 7 to 10, and on PutnamBench it becomes the first system to achieve success. To enable Hard Mode research, the team released MiniF2F-Hard and FIMO-Hard—expert-reannotated Hard Mode variants of two widely-used ATP benchmarks. The work argues that most ATP benchmarks embed final answers within formal statements, which simplifies tasks relative to human competitors and may lead to optimistic estimates of model capability. The research was announced on arXiv with identifier 2604.15839v1.

Key facts

DAP framework uses LLM natural-language reasoning with explicit self-reflection
Addresses "Hard Mode" where systems must independently discover answers
Rewrites Hard Mode statements into Easy Mode for existing ATP provers
Sets state of the art: raises CombiBench solved problems from 7 to 10
First system to achieve success on PutnamBench
Researchers released MiniF2F-Hard and FIMO-Hard benchmark variants
Most ATP benchmarks embed answers in statements ("Easy Mode")
Research announced on arXiv with identifier 2604.15839v1

New AI Framework DAP Advances Automated Theorem Proving in Hard Mode Setting

Key facts

Entities

Institutions

Sources