Preisach Attention Layer: A New Sequence Model Based on Hysteresis

publication · 2026-05-25

A new paper on arXiv introduces the Preisach Attention Layer (PAL), a sequence modeling architecture that replaces softmax attention with a binary relay operator inspired by the Preisach hysteresis model from physics. PAL maintains a stack of local extrema as internal state. The authors prove that a single-layer PAL-Transformer with O(1) depth is Turing-complete, whereas standard hard-attention transformers require O(log n) depth. They also show that PAL and transformers compute incomparable function classes: PAL computes historical range statistics in O(1) layers that need O(log n) layers for transformers, while transformers can perform random-access retrieval that PAL cannot without auxiliary state. The paper is available on arXiv under ID 2605.23603.

Key facts

Preisach Attention Layer (PAL) is a novel sequence modeling architecture.
PAL replaces softmax attention with a binary relay operator.
The operator is parameterized by learned activation and deactivation thresholds.
PAL maintains a stack of local extrema as internal state.
A single-layer PAL-Transformer with O(1) depth is Turing-complete.
Standard hard-attention transformers require O(log n) depth for Turing completeness.
PAL computes historical range statistics in O(1) layers.
Transformers require O(log n) layers for historical range statistics.
Transformers support random-access retrieval that PAL cannot perform without auxiliary state.
The paper is published on arXiv with ID 2605.23603.

Preisach Attention Layer: A New Sequence Model Based on Hysteresis

Key facts

Entities

Institutions

Sources