GuardedRepair Improves LLM Math Reasoning Accuracy
GuardedRepair, a newly developed framework, tackles the uneven risks associated with post-hoc repairs in LLM mathematical reasoning. While correcting an erroneous trace is advantageous, altering a correct one could be detrimental. This system functions within a selective replacement context, evaluating whether a repaired candidate is more reliable than keeping the original cached trace. GuardedRepair integrates lightweight symbolic checks, surface semantic-risk assessments, bounded candidate generation, and cautious acceptance strategies. On the complete GSM8K test set, where the initial reasoner has a 95.60% accuracy rate, GuardedRepair enhances the final accuracy to 96.89%, rectifying 17 out of 58 remaining mistakes without compromising any correct traces.
Key facts
- GuardedRepair is a guarded best-of-N repair framework for LLM mathematical reasoning.
- It diagnoses cached reasoning traces and selectively triggers repair.
- It accepts answer-changing candidates only when deterministic verification guards support replacement.
- The framework combines symbolic checks, semantic-risk diagnostics, bounded candidate generation, and conservative acceptance policies.
- Tested on GSM8K dataset with initial accuracy of 95.60%.
- Final accuracy improved to 96.89%.
- Fixed 17 of 58 remaining errors without breaking correct traces.
- The paper is available on arXiv with ID 2605.24613.
Entities
Institutions
- arXiv