LLMs Show Compositional Literary Primitives via Sparse Autoencoders

ai-technology · 2026-05-20

A research paper on arXiv (2605.18808) explores the compositional literary elements in instruction-tuned large language models through the use of sparse autoencoders applied to mid-depth residual streams. The team examined Llama 3.1 8B-Instruct and Gemma 2 9B-IT, uncovering four distinct feature categories: naming-gates that enhance lexical tokens of intended emotions, an eleven-self cluster of first-person register traits, stylistic register modulators (including show-don't-tell and defamiliarization), and compositional emotions that emerge solely from multi-feature steering. In a forced-choice evaluation involving a 5-LLM judge panel and a 27-category emotion taxonomy (Cowen-Keltner), Llama achieved a perfect score of 27/27, while Gemma scored 23/27, with adoration being the only strict failure. Random judging indicated a per-cell pass probability around 10^{-3} and an anticipated two-seed false-positive cell count across the catalog.

Key facts

arXiv paper 2605.18808 characterizes compositional literary primitives in LLMs.
Models studied: Llama 3.1 8B-Instruct and Gemma 2 9B-IT.
Sparse autoencoders applied on mid-depth residual streams.
Four feature classes: naming-gates, eleven-self cluster, stylistic modulators, compositional emotions.
Llama achieved full 27/27 emotion coverage under forced-choice 5-LLM judge panel.
Gemma reached 23/27 coverage with adoration as strict-fail.
Cowen-Keltner 27-category emotion taxonomy used.
Random judging pass probability ~10^{-3}.

LLMs Show Compositional Literary Primitives via Sparse Autoencoders

Key facts

Entities

Institutions

Sources