StateX enhances RNN recall by expanding recurrent state post-training
A new method called StateX improves the recall ability of recurrent neural networks (RNNs) by expanding their recurrent state size after training. RNNs, including linear attention and state-space models, are popular for processing long contexts due to constant per-token complexity, but they struggle with tasks requiring accurate recall because all context is compressed into a fixed-size state. Prior work shows recall correlates with state size, but training large-state RNNs is costly. StateX is a post-training framework that modifies architectures to scale up state size with negligible parameter increase. Experiments on models with up to 7 billion parameters demonstrate improved recall on long-context tasks. The paper is available on arXiv under identifier 2509.22630.
Key facts
- StateX is a post-training framework for expanding RNN states.
- It targets linear attention and state-space models.
- State expansion improves recall ability without significant parameter increase.
- Experiments were conducted on models with up to 7 billion parameters.
- The paper is available on arXiv: 2509.22630.
Entities
Institutions
- arXiv