Language Models Learn Number Representations via Periodic Features

publication · 2026-04-24

A study published on arXiv reveals that language models trained on natural text represent numbers using periodic features with dominant periods at T=2, 5, and 10. The research identifies a two-tiered hierarchy: while Transformers, Linear RNNs, LSTMs, and classical word embeddings all learn features with period-T spikes in the Fourier domain, only some achieve geometrically separable features for linear mod-T classification. The authors prove that Fourier sparsity is necessary but not sufficient for geometric separability. Empirical results show that data, architecture, optimizer, and tokenizer influence whether models acquire geometrically separable features, which can arise from complementary co-occurrence signals.

Key facts

Language models learn periodic features with dominant periods T=2, 5, 10
Transformers, Linear RNNs, LSTMs, and word embeddings all show Fourier spikes
Only some models learn geometrically separable features for mod-T classification
Fourier domain sparsity is necessary but not sufficient for geometric separability
Data, architecture, optimizer, and tokenizer affect feature learning
Two routes for acquiring geometrically separable features identified
Complementary co-occurrence signal is one route
Study published on arXiv with ID 2604.20817

Language Models Learn Number Representations via Periodic Features

Key facts

Entities

Institutions

Sources