Distinguishing Capability Elicitation from Creation in LLM Post-Training
A recent paper on arXiv (2605.08368) presents the idea that research following the training of large language models should differentiate between capability elicitation and capability creation. The authors argue that the prevailing perception of supervised fine-tuning (SFT) as mere imitation and reinforcement learning (RL) as a means of discovery is overly simplistic. They propose the notion of accessible support, which refers to the range of behaviors a model can effectively demonstrate within limited resources. Capability elicitation involves reweighting behaviors within this support, while capability creation entails altering the support itself. This perspective is framed through free-energy principles, illustrating that both SFT and RL can be interpreted as reweighting a pretrained reference distribution. The paper seeks to clarify this distinction for future investigations.
Key facts
- Paper distinguishes capability elicitation from capability creation in LLM post-training.
- Introduces the concept of accessible support: behaviors a model can produce under finite budgets.
- Reweighting within accessible support is elicitation; changing support is creation.
- Argues SFT and RL both reweight a pretrained reference distribution.
- Develops argument through a free-energy perspective.
- Critiques coarse view of SFT as imitation and RL as discovery.
- Aims to make the distinction operational for post-training research.
- Published on arXiv with ID 2605.08368.
Entities
Institutions
- arXiv