GuardedRepair Improves LLM Math Reasoning Accuracy

ai-technology · 2026-05-26

GuardedRepair, a newly developed framework, tackles the uneven risks associated with post-hoc repairs in LLM mathematical reasoning. While correcting an erroneous trace is advantageous, altering a correct one could be detrimental. This system functions within a selective replacement context, evaluating whether a repaired candidate is more reliable than keeping the original cached trace. GuardedRepair integrates lightweight symbolic checks, surface semantic-risk assessments, bounded candidate generation, and cautious acceptance strategies. On the complete GSM8K test set, where the initial reasoner has a 95.60% accuracy rate, GuardedRepair enhances the final accuracy to 96.89%, rectifying 17 out of 58 remaining mistakes without compromising any correct traces.

Key facts

GuardedRepair is a guarded best-of-N repair framework for LLM mathematical reasoning.
It diagnoses cached reasoning traces and selectively triggers repair.
It accepts answer-changing candidates only when deterministic verification guards support replacement.
The framework combines symbolic checks, semantic-risk diagnostics, bounded candidate generation, and conservative acceptance policies.
Tested on GSM8K dataset with initial accuracy of 95.60%.
Final accuracy improved to 96.89%.
Fixed 17 of 58 remaining errors without breaking correct traces.
The paper is available on arXiv with ID 2605.24613.

GuardedRepair Improves LLM Math Reasoning Accuracy

Key facts

Entities

Institutions

Sources