Spectral Geometry of Transformer Residual Stream Reveals Learned Dimensional Collapse

ai-technology · 2026-05-16

A recent research paper from arXiv (2605.14258) conducts a comprehensive Jacobian eigendecomposition on three large-scale LLMs, uncovering a consistent spectral gradient that transitions from non-normal, rotation-centric initial layers to nearly symmetric final layers. Additionally, it identifies a cumulative low-rank bottleneck that channels disturbances into a limited number of effective dimensions within the residual stream. The findings indicate that both the spectral gradient and dimensional collapse are acquired through learning rather than being inherent to the architecture, providing insights into the dynamics of computation as it moves through transformer layers.

Key facts

Full Jacobian eigendecomposition performed across three production-scale LLMs
Training installs a monotonic spectral gradient from non-normal early layers to near-symmetric late layers
Cumulative low-rank bottleneck reduces effective dimensions of residual stream
Spectral gradient and dimensional collapse are learned, not architectural
Treats depth as discrete time and residual stream as dynamical system
Previous analyses relied on scalar summaries or approximate linearizations
arXiv preprint 2605.14258
Announce type: cross

Spectral Geometry of Transformer Residual Stream Reveals Learned Dimensional Collapse

Key facts

Entities

Institutions

Sources