Invariant Properties Discovered in Softmax Attention Mechanisms

other · 2026-05-07

A new arXiv preprint (2605.02907) reveals invariant properties in softmax attention, a core component of transformer models. The authors define the 'energy field' as row-centered attention logits and demonstrate two classes of invariants. Mechanism-level invariants arise from the algebraic structure of softmax attention, including a per-row zero-sum constraint, a rank bound determined by head dimension, and spectral signatures. Model-level regularities, not required by the mechanism, hold across all tested autoregressive language models from various architecture families. The energy field's variance distributes over key positions without concentration, a property traced to 'key incoherence' in the key matrix. These findings have practical consequences for understanding and improving attention-based models.

Key facts

arXiv preprint 2605.02907
Softmax attention maps query-key interactions into probability distributions
Energy field defined as row-centered attention logit
Two classes of invariants: mechanism-level and model-level
Mechanism-level invariants include per-row zero-sum constraint, rank bound, spectral signatures
Model-level regularities hold across all tested autoregressive language models
Energy field variance delocalizes over key positions
Delocalization traced to key incoherence in the key matrix

Invariant Properties Discovered in Softmax Attention Mechanisms

Key facts

Entities

Institutions

Sources