LightReasoner: Small Models Teach Large Models Reasoning
Researchers propose LightReasoner, a framework where smaller language models (SLMs) teach larger language models (LLMs) reasoning by identifying high-value reasoning moments. The method uses behavioral divergence between a stronger expert LLM and a weaker amateur SLM. It operates in two stages: sampling critical reasoning moments via expert-amateur contrast to construct supervision examples, and fine-tuning to align the LLM. This approach reduces reliance on large curated datasets and uniform token optimization, addressing resource-intensive supervised fine-tuning. The work is published on arXiv under ID 2510.07962.
Key facts
- LightReasoner leverages behavioral divergence between a stronger expert model (LLM) and a weaker amateur model (SLM).
- The framework operates in two stages: sampling critical reasoning moments and fine-tuning.
- It uses expert-amateur contrast to pinpoint high-value reasoning moments.
- The approach reduces resource-intensive supervised fine-tuning.
- The paper is available on arXiv with ID 2510.07962.
- The method explores if smaller models can teach larger models reasoning.
- Supervised fine-tuning traditionally relies on large curated datasets and rejection-sampled demonstrations.
- Only a fraction of tokens in SFT carry meaningful learning value.
Entities
Institutions
- arXiv