Transformers Can Simulate Arbitrary Attention Mechanisms

ai-technology · 2026-04-24

A new paper on arXiv investigates whether transformer encoders can simulate arbitrary attention mechanisms. The authors construct a universal simulator U composed of transformer encoder layers that can replicate the computations of any vanilla attention mechanism. This work sits at the intersection of learnability and expressivity, addressing a theoretical gap between data-driven probabilistic guarantees and deterministic computability proofs. Previous research established Turing-completeness for transformers and explored bounds on circuit complexity and formal logic. The study provides a theoretical framework for understanding the computational limits of transformer architectures.

Key facts

Paper titled 'On the Existence of Universal Simulators of Attention'
Published on arXiv with ID 2506.18739
Investigates transformer encoder's ability to simulate vanilla attention mechanisms
Constructs a universal simulator U composed of transformer encoder layers
Bridges learnability and expressivity in transformer research
Previous work focused on data-driven probabilistic guarantees
Earlier results proved Turing-completeness of transformers
Study examines circuit complexity and formal logic bounds

Transformers Can Simulate Arbitrary Attention Mechanisms

Key facts

Entities

Institutions

Sources