Language Model Features Have Life Histories That Matter

ai-technology · 2026-05-20

A recent preprint on arXiv (2605.18789) indicates that language model features have unique life cycles, characterized by emergence, persistence, and eventual extinction during training. Researchers examined Pythia-160M and -410M, discovering a 'carrier scaffold' comprising around 50 sparse features with stable life spans that shape the model's representational framework. This scaffold forms early on, with features appearing, disappearing, and reorganizing approximately 40 times faster in the initial 1% of training compared to later stages, and it is mostly established by that point. Cross-layer ablation analysis reveals that these carriers bear significantly more load than any matched non-scaffold features, a distinction not apparent through single-feature firing methods. Furthermore, the identification of future carriers can be anticipated early in training. This research highlights the significance of feature life history in understanding model behavior and interpretability.

Key facts

arXiv preprint 2605.18789 examines feature life histories in language models.
Features in language models emerge, persist, and die during training.
Study focuses on Pythia-160M and Pythia-410M models.
A 'carrier scaffold' of ~50 sparse features with stable life histories was identified.
The scaffold assembles early, with feature dynamics ~40x faster in first 1% of training.
Joint cross-layer ablation reveals carriers are more load-bearing than non-scaffold features.
Predictability of carrier features is established early in training.
Research highlights the importance of feature life history for interpretability.

Language Model Features Have Life Histories That Matter

Key facts

Entities

Institutions

Sources