WriteSAE: Sparse Autoencoders for Recurrent State Editing
Researchers have unveiled WriteSAE, the inaugural sparse autoencoder aimed at breaking down and modifying the matrix cache write in state-space and hybrid recurrent language models, including Gated DeltaNet, Mamba-2, and RWKV-7. In contrast to traditional SAEs that utilize residual streams, WriteSAE dissects each decoder atom into its original write format, allowing for closed-form predictions of per-token logit shifts and training through matched Frobenius norm for sequential cache slot swaps. Atom substitution surpasses matched-norm ablation in 92.4% of 4,851 firings at Qwen3.5-0.8B L9 H4, achieving 89.8% success on the 87-atom population test. The closed form predicts observed effects with R²=0.98, while Mamba-2-370M shows 88.1% substitution across 2,500 firings. Sustained three-position installs yield a threefold increase in midrank target-in-continuation from 33.3% to 100% during greedy decoding.
Key facts
- WriteSAE is the first sparse autoencoder for matrix cache write decomposition in recurrent LLMs.
- Targets Gated DeltaNet, Mamba-2, and RWKV-7.
- Uses rank-1 updates k_t v_t^T for write operations.
- Atom substitution beats matched-norm ablation on 92.4% of 4,851 firings at Qwen3.5-0.8B L9 H4.
- 87-atom population test holds at 89.8%.
- Closed form predicts effects with R²=0.98.
- Mamba-2-370M substitutes at 88.1% over 2,500 firings.
- Sustained three-position installs lift target-in-continuation from 33.3% to 100%.
Entities
—