Transformers with Average Attention Match Arithmetic Circuits

ai-technology · 2026-05-07

A recent study published on arXiv (2605.04683) investigates the capabilities of transformer encoders functioning as sequence-to-sequence mappings for vectors. The researchers reveal that average hard attention can effectively replicate arithmetic circuits when these circuits are input into the encoder. These simulated circuit families maintain a constant depth while allowing for unlimited addition, binary multiplication, and sign gates. In this research, transformers substitute feed-forward networks with arithmetic circuits. Additionally, the functions generated by these transformers using typical average attention can also be computed by the same class of circuit families. The findings are applicable to transformers over the reals, rationals, and any intermediary ring. This paper falls under the category of Computer Science > Computational Complexity.

Key facts

arXiv paper ID 2605.04683
Title: Average Attention Transformers and Arithmetic Circuits
Analyzes computational power of transformer encoders
Average hard attention can simulate arithmetic circuits
Simulated circuits have constant depth
Circuits use unbounded addition, binary multiplication, sign gates
Transformers use arithmetic circuits instead of feed-forward networks
Results hold for reals, rationals, and intermediate rings
Classified under Computer Science > Computational Complexity

Transformers with Average Attention Match Arithmetic Circuits

Key facts

Entities

Institutions

Sources