ARTFEED — Contemporary Art Intelligence

Transformers Can Learn Superposition via Möbius Attractor and Cascade Supervision

publication · 2026-05-20

A new arXiv paper (2605.18820v1) proves that gradient descent can learn superposition in Transformers, closing a gap left open by Zhu et al. (2025). The authors identify a Möbius attractor in the layerwise dynamics under S_n-symmetry, reducing the optimization to a 1D Möbius map whose zero set contains the equal-weight superposition state. They also introduce Cascade Supervision, a loss class that delivers selectivity through the backward pass. The work focuses on Reachability-by-Superposition over Erdős–Rényi graphs.

Key facts

  • Paper arXiv:2605.18820v1
  • Published on arXiv
  • Focuses on superposition in Transformers
  • Identifies Möbius attractor under S_n-symmetry
  • Introduces Cascade Supervision loss class
  • Addresses Reachability-by-Superposition over Erdős–Rényi graphs
  • Builds on work by Zhu et al. (2025)
  • Proves gradient descent can find superposition state

Entities

Institutions

  • arXiv

Sources