LightReasoner: Small Models Teach Large Models Reasoning

ai-technology · 2026-05-23

Researchers propose LightReasoner, a framework where smaller language models (SLMs) teach larger language models (LLMs) reasoning by identifying high-value reasoning moments. The method uses behavioral divergence between a stronger expert LLM and a weaker amateur SLM. It operates in two stages: sampling critical reasoning moments via expert-amateur contrast to construct supervision examples, and fine-tuning to align the LLM. This approach reduces reliance on large curated datasets and uniform token optimization, addressing resource-intensive supervised fine-tuning. The work is published on arXiv under ID 2510.07962.

Key facts

LightReasoner leverages behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM).
The framework operates in two stages: sampling critical reasoning moments and fine-tuning.
It uses expert-amateur contrast to pinpoint high-value reasoning moments.
The approach reduces resource-intensive supervised fine-tuning.
The paper is available on arXiv with ID 2510.07962.
The method explores if smaller models can teach larger models reasoning.
Supervised fine-tuning traditionally relies on large curated datasets and rejection-sampled demonstrations.
Only a fraction of tokens in SFT carry meaningful learning value.

LightReasoner: Small Models Teach Large Models Reasoning

Key facts

Entities

Institutions

Sources