ARTFEED — Contemporary Art Intelligence

Disagreement-Guided Strategy Routing for Test-Time Scaling

other · 2026-04-30

A recent paper on arXiv (2604.26644) presents a framework that does not require training for scaling at test time in large reasoning models (LRMs). The researchers found a strong link between output disagreement and the difficulty of instances, as well as the accuracy of predictions, which allows for the selection of strategies dynamically. Instead of increasing computation for a single approach, this framework directs instances to various scaling strategies based on the level of disagreement: it uses lightweight resolution for consistent instances, majority voting for moderate disagreement, and rewriting for high disagreement. This method seeks to enhance performance on difficult mathematical reasoning tasks while circumventing the diminishing returns associated with techniques such as repeated sampling, self-correction, and tree search.

Key facts

  • arXiv paper ID: 2604.26644
  • Title: When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling
  • Focuses on large reasoning models (LRMs) for mathematical reasoning
  • Output disagreement is used as signal for instance-level strategy selection
  • Framework is training-free
  • Strategies: lightweight resolution, majority voting, rewriting
  • Aims to improve test-time scaling efficiency
  • Addresses diminishing returns of existing methods on hard problems

Entities

Institutions

  • arXiv

Sources