Invariant Properties Discovered in Softmax Attention Mechanisms
A new arXiv preprint (2605.02907) reveals invariant properties in softmax attention, a core component of transformer models. The authors define the 'energy field' as row-centered attention logits and demonstrate two classes of invariants. Mechanism-level invariants arise from the algebraic structure of softmax attention, including a per-row zero-sum constraint, a rank bound determined by head dimension, and spectral signatures. Model-level regularities, not required by the mechanism, hold across all tested autoregressive language models from various architecture families. The energy field's variance distributes over key positions without concentration, a property traced to 'key incoherence' in the key matrix. These findings have practical consequences for understanding and improving attention-based models.
Key facts
- arXiv preprint 2605.02907
- Softmax attention maps query-key interactions into probability distributions
- Energy field defined as row-centered attention logit
- Two classes of invariants: mechanism-level and model-level
- Mechanism-level invariants include per-row zero-sum constraint, rank bound, spectral signatures
- Model-level regularities hold across all tested autoregressive language models
- Energy field variance delocalizes over key positions
- Delocalization traced to key incoherence in the key matrix
Entities
Institutions
- arXiv