LLMs Show Categorical Perception in Digit-Number Boundaries

ai-technology · 2026-04-27

A recent investigation indicates that large language models (LLMs) demonstrate categorical perception (CP) in their hidden-state representations while interpreting Arabic numerals. Researchers employed representational similarity analysis on six models across five architectural families and discovered that a CP-additive model (log-distance combined with a boundary boost) better represents the geometry than a purely continuous model, achieving this in 100% of primary layers for all models examined. This phenomenon is linked to specific boundaries at digit-count transitions (10 and 100) and is not present at control positions or in the temperature domain, where linguistic categories (hot/cold) do not exhibit tokenisation discontinuity. Two distinct qualitative patterns were identified: "classic CP" (Gemma, Qwen), where models categorize and improve discriminability at boundaries, and a different pattern in other models. The findings, available on arXiv (2603.28258), apply perceptual psychology principles to artificial neural networks.

Key facts

Categorical perception (CP) is enhanced discriminability at category boundaries.
The study uses representational similarity analysis across six LLMs from five architecture families.
A CP-additive model fits better than a purely continuous model at 100% of primary layers in every model tested.
The effect is specific to digit-count transitions at 10 and 100.
No effect is found at non-boundary control positions.
No effect is found in the temperature domain (hot/cold).
Two qualitative signatures: 'classic CP' (Gemma, Qwen) and another pattern.
The paper is published on arXiv with ID 2603.28258.

LLMs Show Categorical Perception in Digit-Number Boundaries

Key facts

Entities

Institutions

Sources