ReAD: Reinforcement-Guided Capability Distillation for LLMs
A new framework called ReAD (Reinforcement-guided cApability Distillation) addresses the challenge of compressing large language models (LLMs) into smaller ones while preserving task-specific abilities. Current capability distillation methods treat capabilities as independent, ignoring how improving one capability affects others. ReAD explicitly models capability interdependence under a fixed token budget, leveraging reinforcement learning to optimize the distillation process. The approach builds on observed patterns: distillation induces systematic cross-capability transfer that depends on budget, and additional budget often yields limited task-relevant gains while potentially degrading other abilities. By inferring task-essential capabilities and guiding their development, ReAD aims to produce more efficient and effective smaller models for downstream tasks.
Key facts
- ReAD is a Reinforcement-guided cApability Distillation framework for LLMs.
- It addresses capability interdependence in knowledge distillation.
- Current methods treat capabilities as independent training targets.
- Distillation induces systematic, budget-dependent cross-capability transfer.
- Additional budget often brings limited task-relevant gains.
- Extra budget can sometimes degrade other useful abilities.
- ReAD explicitly accounts for capability interdependence.
- The framework uses reinforcement learning to guide distillation.
Entities
Institutions
- arXiv