Verbal Process Supervision Boosts LLM Reasoning Without Training

ai-technology · 2026-04-25

There's this new method called Verbal Process Supervision (VPS) that boosts how well large language models think by introducing critiques from a more sophisticated model, and it doesn't need any extra training. For instance, on the GPQA Diamond benchmark, GPT-5.4 achieved an impressive 94.9% accuracy with a round budget of R=4, beating the previous best of 94.1% without any gradient tweaks. At AIME 2025, VPS really helps less powerful models, raising their performance from 11.7–26.7% to an amazing 63.3–90.0%, which is a jump of up to 63.3 points. When looking at similar computing power, VPS outperforms Reflexion by up to 12.1 points and Self-Consistency@5 by 5.0 points on GPQA, plus 8.3 points on LiveCodeBench V6.

Key facts

VPS is a training-free framework using structured natural-language critique from a stronger supervisor.
On GPQA Diamond, GPT-5.4 (High) | GPT-5.4 (Low) achieves 94.9% at R=4, surpassing 94.1% state of the art.
On AIME 2025, VPS boosts weak model scores from 11.7-26.7% to 63.3-90.0% (up to +63.3 points).
At matched compute, VPS outperforms Reflexion by +8.5 to +12.1 points.
VPS outperforms Self-Consistency@5 by +5.0 pp on GPQA and +8.3 pp on LiveCodeBench V6.
VPS introduces a fourth axis: granularity of external verbal supervision.
Results cover GPQA Diamond, AIME 2025, and LiveCodeBench V6.
VPS works with both closed and open models.

Verbal Process Supervision Boosts LLM Reasoning Without Training

Key facts

Entities

Institutions

Sources