ARTFEED — Contemporary Art Intelligence

Transformer Attention Heads: Positional vs Symbolic Dynamics

other · 2026-06-01

A study on decoder-only Transformers (GPT-J) reveals that successful multi-hop reasoning requires the emergence of pure attention heads—either positional or symbolic. Two structurally equivalent tasks (number and letter reasoning) impose different mechanistic demands despite their equivalence.

Key facts

  • Study uses GPT-J model
  • Two tasks: number (positional) and letter (symbolic)
  • Pure heads emerge during successful learning
  • Tasks are structurally equivalent but require different head types
  • Number task needs both positional and symbolic heads
  • Letter task requires only symbolic heads
  • Research aims to understand safe deployment of LLMs
  • Published on arXiv (2605.31558)

Entities

Institutions

  • arXiv

Sources