Crosscoders Track Linguistic Feature Emergence in LLM Pretraining
Researchers have introduced a method using sparse crosscoders to track when and how large language models (LLMs) acquire specific linguistic abilities during pretraining. By aligning features across model checkpoints, they can detect the emergence and consolidation of representations such as irregular plural noun detection. A novel metric, Relative Indirect Effects (RelIE), traces the causal importance of individual features for task performance. The study uses open-sourced checkpoint triplets with significant performance and representation shifts. This approach bridges a gap in understanding LLM training at the concept level, as traditional benchmarking fails to reveal the acquisition process. The findings shed light on the dynamics of linguistic abstraction formation in neural networks.
Key facts
- Sparse crosscoders are used to discover and align features across LLM checkpoints.
- The method tracks evolution of linguistic features during pretraining.
- Relative Indirect Effects (RelIE) is a novel metric introduced.
- RelIE traces training stages where features become causally important.
- Study uses open-sourced checkpoint triplets with significant shifts.
- Focus is on detecting emergence of irregular plural noun detection.
- Traditional evaluation methods like benchmarking fail to reveal acquisition process.
- Research bridges gap in understanding concept-level learning in LLMs.
Entities
Institutions
- arXiv