ARTFEED — Contemporary Art Intelligence

LLMs Show Categorical Perception in Digit-Number Boundaries

ai-technology · 2026-04-27

A recent investigation indicates that large language models (LLMs) demonstrate categorical perception (CP) in their hidden-state representations while interpreting Arabic numerals. Researchers employed representational similarity analysis on six models across five architectural families and discovered that a CP-additive model (log-distance combined with a boundary boost) better represents the geometry than a purely continuous model, achieving this in 100% of primary layers for all models examined. This phenomenon is linked to specific boundaries at digit-count transitions (10 and 100) and is not present at control positions or in the temperature domain, where linguistic categories (hot/cold) do not exhibit tokenisation discontinuity. Two distinct qualitative patterns were identified: "classic CP" (Gemma, Qwen), where models categorize and improve discriminability at boundaries, and a different pattern in other models. The findings, available on arXiv (2603.28258), apply perceptual psychology principles to artificial neural networks.

Key facts

  • Categorical perception (CP) is enhanced discriminability at category boundaries.
  • The study uses representational similarity analysis across six LLMs from five architecture families.
  • A CP-additive model fits better than a purely continuous model at 100% of primary layers in every model tested.
  • The effect is specific to digit-count transitions at 10 and 100.
  • No effect is found at non-boundary control positions.
  • No effect is found in the temperature domain (hot/cold).
  • Two qualitative signatures: 'classic CP' (Gemma, Qwen) and another pattern.
  • The paper is published on arXiv with ID 2603.28258.

Entities

Institutions

  • arXiv

Sources