Toeplitz MLP Mixer: Efficient Sequence Model with Linear Complexity

ai-technology · 2026-05-11

A new architecture called the Toeplitz MLP Mixer (TMM) has been developed by researchers, which substitutes traditional attention mechanisms with triangular-masked Toeplitz matrix multiplication. This innovation allows for O(dn log n) time and O(dn) space complexity during training, and O(dn) for both time and space during inference prefill, surpassing the quadratic complexity associated with standard attention. Although TMMs do not incorporate advanced input modulation or state retention, they demonstrate enhanced training efficiency in terms of loss per compute and device memory. Additionally, they are capable of retaining more input information, resulting in better copying performance and improved information retrieval. The full paper can be found on arXiv.

Key facts

Toeplitz MLP Mixer (TMM) is introduced as a transformer-like architecture.
TMM swaps attention for triangular-masked Toeplitz matrix multiplication.
Training complexity: O(dn log n) time and O(dn) space.
Inference prefill complexity: O(dn) time and space.
TMMs lack sophisticated input modulation or state maintenance.
TMMs yield greater training efficiency in loss per compute and device memory.
TMMs retain more input information and have improved copying ability.
Paper available on arXiv with ID 2605.06683.

Toeplitz MLP Mixer: Efficient Sequence Model with Linear Complexity

Key facts

Entities

Institutions

Sources