Five Lines of Code Reveal LLMs' Secret Semantic Dictionaries

ai-technology · 2026-05-26

A recent study published on arXiv (2605.22005) illustrates that applying singular value decomposition (SVD) to the lm_head weight matrix of transformer-based large language models (LLMs) can uncover interpretable semantic subspaces directly from the model weights, using only five lines of PyTorch code without requiring inference. Each left singular vector highlights vocabulary tokens that are most likely chosen when the hidden state corresponds with the singular direction, shedding light on the composition and curation philosophy of the training data. The authors analyzed GPT-OSS-120B, Gemma-2-2B, and Qwen2.5-1.5B, revealing distinct patterns: GPT exhibits a structured hierarchy of differentiated subspaces; Gemma is influenced by pre-nineteenth-century English orthography; and Qwen offers extensive multilingual representation. This approach can reveal unintended patterns in training data without model inference.

Key facts

Method uses SVD of lm_head weight matrix
Requires only five lines of PyTorch code
No model inference needed
Applied to GPT-OSS-120B, Gemma-2-2B, Qwen2.5-1.5B
GPT shows graduated hierarchy of subspaces
Gemma dominated by pre-19th-century English orthography
Qwen exhibits broad multilingual coverage
Paper available on arXiv: 2605.22005

Five Lines of Code Reveal LLMs' Secret Semantic Dictionaries

Key facts

Entities

Institutions

Sources