Disagreement-Guided Strategy Routing for Test-Time Scaling

other · 2026-04-30

A recent paper on arXiv (2604.26644) presents a framework that does not require training for scaling at test time in large reasoning models (LRMs). The researchers found a strong link between output disagreement and the difficulty of instances, as well as the accuracy of predictions, which allows for the selection of strategies dynamically. Instead of increasing computation for a single approach, this framework directs instances to various scaling strategies based on the level of disagreement: it uses lightweight resolution for consistent instances, majority voting for moderate disagreement, and rewriting for high disagreement. This method seeks to enhance performance on difficult mathematical reasoning tasks while circumventing the diminishing returns associated with techniques such as repeated sampling, self-correction, and tree search.

Key facts

arXiv paper ID: 2604.26644
Title: When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
Focuses on large reasoning models (LRMs) for mathematical reasoning
Output disagreement is used as signal for instance-level strategy selection
Framework is training-free
Strategies: lightweight resolution, majority voting, rewriting
Aims to improve test-time scaling efficiency
Addresses diminishing returns of existing methods on hard problems

Disagreement-Guided Strategy Routing for Test-Time Scaling

Key facts

Entities

Institutions

Sources