RoPE Fails in Long Contexts: Attention Becomes Random
A new theoretical analysis published on arXiv (2605.15514) proves that Rotary Positional Embeddings (RoPE) lose their effectiveness in Transformer-based language models as context length increases. The study abstracts from specific content, focusing solely on context length. It demonstrates that RoPE-based attention becomes unpredictable, losing both its locality bias and consistency in token relevance. The probability of failure approaches 0.5, equivalent to random guessing. Additionally, attention scores can remain unchanged when a key token is moved or replaced, indicating a failure to distinguish positions or tokens.
Key facts
- arXiv paper 2605.15514 identifies intrinsic limitations of RoPE in long-context language models.
- Theoretical analysis depends only on context length, not specific content.
- RoPE loses locality bias in long contexts, favoring nearer positions no more than distant ones.
- RoPE loses consistency in token relevance; attention scores become unpredictable.
- Probability of failure approaches 0.5, no better than random guessing.
- Attention scores can remain unchanged when a key token is moved or replaced.
- The study proves RoPE fails to distinguish positions or tokens in long contexts.
Entities
Institutions
- arXiv