ARTFEED — Contemporary Art Intelligence

QED: Open-Source Multi-Agent System for Mathematical Proofs

ai-technology · 2026-04-29

A team of researchers has unveiled QED, an open-source multi-agent framework aimed at producing original mathematical proofs for unresolved issues. This system tackles seven specific failure modes found in large language models (LLMs) that hinder dependable proof generation, such as context contamination, citation hallucination, neglecting crucial steps, misallocation of proof efforts, inconsistent proof strategies, lack of focus in verification, alterations to the problem, and reliance on a single model. The authors contend that the disparity between success in benchmarks and actual research-level proofs is mainly a design flaw. QED was tested on five unresolved challenges in applied analysis and partial differential equations (PDEs). The research paper is accessible on arXiv with the identifier 2604.24021.

Key facts

  • QED is an open-source multi-agent system for generating mathematical proofs.
  • It targets open research problems in mathematics.
  • Seven failure modes in LLMs were identified: context contamination, citation hallucination, hand-waving on key steps, misallocation of proof effort, unstable proof plans, unfocused verification, problem modification, and single-model bottleneck.
  • The system's architecture directly addresses each failure mode.
  • Evaluated on five open problems in applied analysis and PDEs.
  • The paper is published on arXiv with ID 2604.24021.
  • The authors claim the gap between benchmark success and research-level proving is due to system design.
  • Frontier LLMs were used in systematic experiments.

Entities

Institutions

  • arXiv

Sources