Impossible Triangle: Efficiency, Compactness, and Recall in Long-Context Models

publication · 2026-05-07

A recent paper published on arXiv establishes a crucial trade-off in models designed for long sequences. It asserts that no model can achieve Efficiency (per-step computation independent of sequence length), Compactness (state size independent of sequence length), and Recall (the ability to remember historical facts proportional to sequence length) simultaneously. The authors introduce an Online Sequence Processor framework that integrates Transformers, state space models, linear recurrent networks, and their combinations. By applying the Data Processing Inequality and Fano's Inequality, they demonstrate that any model meeting Efficiency and Compactness can recall a maximum of O(poly(d)/log V) key-value pairs from a sequence of any length, with d representing model dimension and V denoting vocabulary size. The study evaluates 52 architectures released before March 2026, revealing that each can satisfy at most two of the three criteria.

Key facts

Paper published on arXiv with ID 2605.05066
Proves a fundamental trade-off in long-sequence models
Three properties: Efficiency, Compactness, Recall
Formalized within an Online Sequence Processor abstraction
Uses Data Processing Inequality and Fano's Inequality
Recall bound: O(poly(d)/log V) key-value pairs
Classifies 52 architectures from before March 2026
No model achieves all three properties simultaneously

Impossible Triangle: Efficiency, Compactness, and Recall in Long-Context Models

Key facts

Entities

Institutions

Sources