LARK: A Learnability-Based Method for Selecting Reasoning Trajectories in Distillation
LARK is an innovative approach for selecting trajectories in reasoning distillation, as detailed in a recent arXiv paper (2605.30651). Unlike traditional heuristic methods that depend on the quality of trajectories or the confidence of the model, LARK emphasizes the learnability of trajectories by the student model. It identifies trajectories that the student can learn effectively while maintaining the generalization of the complete training distribution. Central to LARK is a learnability factor ρ, which indicates the speed of the student's training loss reduction. To accurately assess this rate, the authors introduce a learnability proxy and a χ²-regularized selection policy that ensures a balance between learnability and distributional coverage, both backed by solid theoretical guarantees on estimation error. This method fills a crucial gap in reasoning distillation by focusing on trajectories that are both high-quality and learnable, which could enhance the training efficiency and effectiveness of the student model.
Key facts
- LARK is a learnability-grounded method for reasoning trajectory selection.
- It selects trajectories that the student can learn efficiently.
- The method preserves generalization of the full training distribution.
- Core concept: learnability factor ρ characterizes student's training loss decrease rate.
- Introduces a learnability proxy for efficient estimation.
- Uses χ²-regularized selection policy to balance learnability and coverage.
- Provides strong theoretical guarantees on estimation error.
- Published on arXiv with ID 2605.30651.
Entities
Institutions
- arXiv