Gated DeltaNet-2 Decouples Erase and Write in Linear Attention

other · 2026-05-23

A recent publication on arXiv presents Gated DeltaNet-2, which features a linear attention mechanism that distinguishes between erasing and writing functions through the use of channel-wise gates. This linear attention substitutes the limitless cache found in softmax attention with a fixed-size recurrent state, enabling sequence mixing in linear time and constant memory for decoding. The difficulty lies in modifying this compact memory without disrupting existing connections. Delta-rule models involve subtracting the current read prior to writing a new value, while Kimi Delta Attention (KDA) enhances forgetting through channel-wise decay. Nonetheless, the active edit still relies on a single scalar gate for both operations. Gated DeltaNet-2 improves upon Gated DeltaNet and KDA by separating these functions with distinct channel-wise erase gate b_t and write gate.

Key facts

Gated DeltaNet-2 is introduced in arXiv paper 2605.22791.
Linear attention replaces softmax attention's unbounded cache with a fixed-size recurrent state.
It reduces sequence mixing to linear time and decoding to constant memory.
Delta-rule models subtract the current read before writing a new value.
Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay.
Previous models used a single scalar gate to control both erasing and writing.
Gated DeltaNet-2 separates erasing and writing with channel-wise gates b_t and write gate.
It generalizes both Gated DeltaNet and KDA.

Gated DeltaNet-2 Decouples Erase and Write in Linear Attention

Key facts

Entities

Institutions

Sources