Post-Reasoning: Boosting LLM Performance Without Extra Token Cost
A new method called Post-Reasoning improves instruction-tuned large language models by having them justify answers after generating the final response, eliminating additional latency and token costs. The approach was tested across 117 model–benchmark settings, 13 models, 4 model families, and 9 benchmarks including AMC, HMMT, GSM8K, and GPQA. Results show performance gains without extra inference overhead.
Key facts
- Post-Reasoning improves instruction-tuned models by conditioning them to justify answers after generating the final response.
- The method eliminates additional latency and token costs.
- Evaluated across 117 model–benchmark settings.
- Tested on 13 open and proprietary models.
- Covers 4 model families.
- Evaluated on 9 diverse reasoning and knowledge-intensive benchmarks.
- Benchmarks include AMC, HMMT, GSM8K, and GPQA.
- The approach is simple and effective, requiring only instruction augmentation.
Entities
—