Reasoning LLMs Reduce Tokens with Stored Skills

ai-technology · 2026-04-25

A novel technique for reasoning with large language models (LLMs) suggests that summarizing and archiving reusable reasoning abilities gained from thorough exploration and experimentation can be beneficial. These skills can then be accessed during inference to enhance future reasoning. In contrast to the traditional 'reasoning from scratch' approach, this method retrieves pertinent skills for each inquiry, enabling the model to bypass unnecessary detours and concentrate on productive solution strategies. When tested on coding and mathematical reasoning challenges, this technique markedly decreases the number of reasoning tokens needed while boosting overall performance. The reduced cost per request highlights significant practical and economic advantages for real-world applications.

Key facts

Method summarizes and stores reusable reasoning skills from deliberation and trial-and-error.
Skills are retrieved at inference time to guide future reasoning.
Contrasts with 'reasoning from scratch' paradigm.
Evaluated on coding and mathematical reasoning tasks.
Significantly reduces reasoning tokens.
Improves overall performance.
Lower per-request cost indicates practical and economic potential.
Proposed by authors on arXiv (submission history not specified).

Reasoning LLMs Reduce Tokens with Stored Skills

Key facts

Entities

Institutions

Sources