LLMs Show Compositional Literary Primitives via Sparse Autoencoders
A research paper on arXiv (2605.18808) explores the compositional literary elements in instruction-tuned large language models through the use of sparse autoencoders applied to mid-depth residual streams. The team examined Llama 3.1 8B-Instruct and Gemma 2 9B-IT, uncovering four distinct feature categories: naming-gates that enhance lexical tokens of intended emotions, an eleven-self cluster of first-person register traits, stylistic register modulators (including show-don't-tell and defamiliarization), and compositional emotions that emerge solely from multi-feature steering. In a forced-choice evaluation involving a 5-LLM judge panel and a 27-category emotion taxonomy (Cowen-Keltner), Llama achieved a perfect score of 27/27, while Gemma scored 23/27, with adoration being the only strict failure. Random judging indicated a per-cell pass probability around 10^{-3} and an anticipated two-seed false-positive cell count across the catalog.
Key facts
- arXiv paper 2605.18808 characterizes compositional literary primitives in LLMs.
- Models studied: Llama 3.1 8B-Instruct and Gemma 2 9B-IT.
- Sparse autoencoders applied on mid-depth residual streams.
- Four feature classes: naming-gates, eleven-self cluster, stylistic modulators, compositional emotions.
- Llama achieved full 27/27 emotion coverage under forced-choice 5-LLM judge panel.
- Gemma reached 23/27 coverage with adoration as strict-fail.
- Cowen-Keltner 27-category emotion taxonomy used.
- Random judging pass probability ~10^{-3}.
Entities
Institutions
- arXiv
- Llama 3.1 8B-Instruct
- Gemma 2 9B-IT
- Cowen-Keltner