Transformer Attention Heads: Positional vs Symbolic Dynamics

other · 2026-06-01

A study on decoder-only Transformers (GPT-J) reveals that successful multi-hop reasoning requires the emergence of pure attention heads—either positional or symbolic. Two structurally equivalent tasks (number and letter reasoning) impose different mechanistic demands despite their equivalence.

Key facts

Study uses GPT-J model
Two tasks: number (positional) and letter (symbolic)
Pure heads emerge during successful learning
Tasks are structurally equivalent but require different head types
Number task needs both positional and symbolic heads
Letter task requires only symbolic heads
Research aims to understand safe deployment of LLMs
Published on arXiv (2605.31558)

Transformer Attention Heads: Positional vs Symbolic Dynamics

Key facts

Entities

Institutions

Sources