Stochastic Backtracking Improves Test-Time Scaling for LLMs
A new arXiv paper (2605.25143) introduces stochastic backtracking to enhance test-time scaling for language model reasoning. The method maintains a persistent pool of historical prefixes, allowing the model to revisit previously generated states rather than only expanding the current frontier. This addresses premature commitment and diversity collapse in PRM-guided search. Two mechanisms are proposed: Subpool Selection applies Top-N selection within random subpools to strengthen greedy search. The approach aims to maximize accuracy while minimizing total generated tokens.
Key facts
- Paper arXiv:2605.25143 introduces stochastic backtracking for test-time scaling.
- Method uses a persistent pool of historical prefixes.
- Allows revisiting previously generated states.
- Addresses premature commitment and diversity collapse.
- Proposes Subpool Selection mechanism.
- Subpool Selection applies Top-N within random subpools.
- Aims to maximize accuracy while minimizing tokens.
- Focuses on PRM-guided search improvement.
Entities
Institutions
- arXiv