Post-Reasoning: Boosting LLM Performance Without Extra Token Cost

ai-technology · 2026-05-09

A new method called Post-Reasoning improves instruction-tuned large language models by having them justify answers after generating the final response, eliminating additional latency and token costs. The approach was tested across 117 model–benchmark settings, 13 models, 4 model families, and 9 benchmarks including AMC, HMMT, GSM8K, and GPQA. Results show performance gains without extra inference overhead.

Key facts

Post-Reasoning improves instruction-tuned models by conditioning them to justify answers after generating the final response.
The method eliminates additional latency and token costs.
Evaluated across 117 model–benchmark settings.
Tested on 13 open and proprietary models.
Covers 4 model families.
Evaluated on 9 diverse reasoning and knowledge-intensive benchmarks.
Benchmarks include AMC, HMMT, GSM8K, and GPQA.
The approach is simple and effective, requiring only instruction augmentation.

Entities

—

Sources

arXiv cs.AI — 2026-05-09