Stochastic Backtracking Improves Test-Time Scaling for LLMs

ai-technology · 2026-05-26

A new arXiv paper (2605.25143) introduces stochastic backtracking to enhance test-time scaling for language model reasoning. The method maintains a persistent pool of historical prefixes, allowing the model to revisit previously generated states rather than only expanding the current frontier. This addresses premature commitment and diversity collapse in PRM-guided search. Two mechanisms are proposed: Subpool Selection applies Top-N selection within random subpools to strengthen greedy search. The approach aims to maximize accuracy while minimizing total generated tokens.

Key facts

Paper arXiv:2605.25143 introduces stochastic backtracking for test-time scaling.
Method uses a persistent pool of historical prefixes.
Allows revisiting previously generated states.
Addresses premature commitment and diversity collapse.
Proposes Subpool Selection mechanism.
Subpool Selection applies Top-N within random subpools.
Aims to maximize accuracy while minimizing tokens.
Focuses on PRM-guided search improvement.

Stochastic Backtracking Improves Test-Time Scaling for LLMs

Key facts

Entities

Institutions

Sources