Preisach Attention Layer: A New Sequence Model Based on Hysteresis
A new paper on arXiv introduces the Preisach Attention Layer (PAL), a sequence modeling architecture that replaces softmax attention with a binary relay operator inspired by the Preisach hysteresis model from physics. PAL maintains a stack of local extrema as internal state. The authors prove that a single-layer PAL-Transformer with O(1) depth is Turing-complete, whereas standard hard-attention transformers require O(log n) depth. They also show that PAL and transformers compute incomparable function classes: PAL computes historical range statistics in O(1) layers that need O(log n) layers for transformers, while transformers can perform random-access retrieval that PAL cannot without auxiliary state. The paper is available on arXiv under ID 2605.23603.
Key facts
- Preisach Attention Layer (PAL) is a novel sequence modeling architecture.
- PAL replaces softmax attention with a binary relay operator.
- The operator is parameterized by learned activation and deactivation thresholds.
- PAL maintains a stack of local extrema as internal state.
- A single-layer PAL-Transformer with O(1) depth is Turing-complete.
- Standard hard-attention transformers require O(log n) depth for Turing completeness.
- PAL computes historical range statistics in O(1) layers.
- Transformers require O(log n) layers for historical range statistics.
- Transformers support random-access retrieval that PAL cannot perform without auxiliary state.
- The paper is published on arXiv with ID 2605.23603.
Entities
Institutions
- arXiv