WriteSAE: Sparse Autoencoders for Recurrent State Editing

ai-technology · 2026-05-14

Researchers have unveiled WriteSAE, the inaugural sparse autoencoder aimed at breaking down and modifying the matrix cache write in state-space and hybrid recurrent language models, including Gated DeltaNet, Mamba-2, and RWKV-7. In contrast to traditional SAEs that utilize residual streams, WriteSAE dissects each decoder atom into its original write format, allowing for closed-form predictions of per-token logit shifts and training through matched Frobenius norm for sequential cache slot swaps. Atom substitution surpasses matched-norm ablation in 92.4% of 4,851 firings at Qwen3.5-0.8B L9 H4, achieving 89.8% success on the 87-atom population test. The closed form predicts observed effects with R²=0.98, while Mamba-2-370M shows 88.1% substitution across 2,500 firings. Sustained three-position installs yield a threefold increase in midrank target-in-continuation from 33.3% to 100% during greedy decoding.

Key facts

WriteSAE is the first sparse autoencoder for matrix cache write decomposition in recurrent LLMs.
Targets Gated DeltaNet, Mamba-2, and RWKV-7.
Uses rank-1 updates k_t v_t^T for write operations.
Atom substitution beats matched-norm ablation on 92.4% of 4,851 firings at Qwen3.5-0.8B L9 H4.
87-atom population test holds at 89.8%.
Closed form predicts effects with R²=0.98.
Mamba-2-370M substitutes at 88.1% over 2,500 firings.
Sustained three-position installs lift target-in-continuation from 33.3% to 100%.

Entities

—

Sources

arXiv cs.AI — 2026-05-14