Attractor Geometry Explains Transformer Memory Failures

ai-technology · 2026-05-09

A new study from arXiv (2605.05686) reveals that language models exhibit two distinct failure modes—conflict and hallucination—both rooted in the attractor geometry of hidden-state space. Conflict arises when parametric memory (PM) and working memory (WM) disagree, disrupting convergence to the correct attractor basin without increasing output entropy. Hallucination occurs when no memorized basin exists, causing the hidden state to drift freely. The frozen LM head, designed for next-token prediction, confidently outputs tokens in both cases, making output-based monitoring ineffective. The findings were verified using a controlled synthetic task with entity identifiers.

Key facts

Language models use parametric memory (PM) and working memory (WM).
Conflict occurs when PM and WM disagree and interfere.
Hallucination occurs when the queried fact was never learned.
Both failures produce confident output, bypassing output-based monitoring.
Failures share a unified geometric account in hidden-state space.
Learned facts form attractor basins; conflict is basin competition.
Hallucination is basin absence; hidden state drifts freely.
Frozen LM head cannot distinguish between the two failure modes.

Attractor Geometry Explains Transformer Memory Failures

Key facts

Entities

Institutions

Sources