Transformers Can Simulate Arbitrary Attention Mechanisms
A new paper on arXiv investigates whether transformer encoders can simulate arbitrary attention mechanisms. The authors construct a universal simulator U composed of transformer encoder layers that can replicate the computations of any vanilla attention mechanism. This work sits at the intersection of learnability and expressivity, addressing a theoretical gap between data-driven probabilistic guarantees and deterministic computability proofs. Previous research established Turing-completeness for transformers and explored bounds on circuit complexity and formal logic. The study provides a theoretical framework for understanding the computational limits of transformer architectures.
Key facts
- Paper titled 'On the Existence of Universal Simulators of Attention'
- Published on arXiv with ID 2506.18739
- Investigates transformer encoder's ability to simulate vanilla attention mechanisms
- Constructs a universal simulator U composed of transformer encoder layers
- Bridges learnability and expressivity in transformer research
- Previous work focused on data-driven probabilistic guarantees
- Earlier results proved Turing-completeness of transformers
- Study examines circuit complexity and formal logic bounds
Entities
Institutions
- arXiv