ARTFEED — Contemporary Art Intelligence

GuardedRepair Improves LLM Math Reasoning Accuracy

ai-technology · 2026-05-26

GuardedRepair, a newly developed framework, tackles the uneven risks associated with post-hoc repairs in LLM mathematical reasoning. While correcting an erroneous trace is advantageous, altering a correct one could be detrimental. This system functions within a selective replacement context, evaluating whether a repaired candidate is more reliable than keeping the original cached trace. GuardedRepair integrates lightweight symbolic checks, surface semantic-risk assessments, bounded candidate generation, and cautious acceptance strategies. On the complete GSM8K test set, where the initial reasoner has a 95.60% accuracy rate, GuardedRepair enhances the final accuracy to 96.89%, rectifying 17 out of 58 remaining mistakes without compromising any correct traces.

Key facts

  • GuardedRepair is a guarded best-of-N repair framework for LLM mathematical reasoning.
  • It diagnoses cached reasoning traces and selectively triggers repair.
  • It accepts answer-changing candidates only when deterministic verification guards support replacement.
  • The framework combines symbolic checks, semantic-risk diagnostics, bounded candidate generation, and conservative acceptance policies.
  • Tested on GSM8K dataset with initial accuracy of 95.60%.
  • Final accuracy improved to 96.89%.
  • Fixed 17 of 58 remaining errors without breaking correct traces.
  • The paper is available on arXiv with ID 2605.24613.

Entities

Institutions

  • arXiv

Sources