ARTFEED — Contemporary Art Intelligence

Verbal Process Supervision Boosts LLM Reasoning Without Training

ai-technology · 2026-04-25

There's this new method called Verbal Process Supervision (VPS) that boosts how well large language models think by introducing critiques from a more sophisticated model, and it doesn't need any extra training. For instance, on the GPQA Diamond benchmark, GPT-5.4 achieved an impressive 94.9% accuracy with a round budget of R=4, beating the previous best of 94.1% without any gradient tweaks. At AIME 2025, VPS really helps less powerful models, raising their performance from 11.7–26.7% to an amazing 63.3–90.0%, which is a jump of up to 63.3 points. When looking at similar computing power, VPS outperforms Reflexion by up to 12.1 points and Self-Consistency@5 by 5.0 points on GPQA, plus 8.3 points on LiveCodeBench V6.

Key facts

  • VPS is a training-free framework using structured natural-language critique from a stronger supervisor.
  • On GPQA Diamond, GPT-5.4 (High) | GPT-5.4 (Low) achieves 94.9% at R=4, surpassing 94.1% state of the art.
  • On AIME 2025, VPS boosts weak model scores from 11.7-26.7% to 63.3-90.0% (up to +63.3 points).
  • At matched compute, VPS outperforms Reflexion by +8.5 to +12.1 points.
  • VPS outperforms Self-Consistency@5 by +5.0 pp on GPQA and +8.3 pp on LiveCodeBench V6.
  • VPS introduces a fourth axis: granularity of external verbal supervision.
  • Results cover GPQA Diamond, AIME 2025, and LiveCodeBench V6.
  • VPS works with both closed and open models.

Entities

Institutions

  • arXiv
  • GPQA Diamond
  • AIME 2025
  • LiveCodeBench V6

Sources