Graph Memory Transformer Replaces FFN with Learned Memory Graph

ai-technology · 2026-04-29

A new study on arXiv has unveiled the Graph Memory Transformer (GMT), which is a decoder-only language model that replaces the standard Feed-Forward Network (FFN) with a specially designed memory graph. The GMT keeps the causal self-attention mechanism but introduces a memory cell that helps navigate token representations using a learned bank of centroids connected by a directed transition matrix. In its base version, GMT v7, each of the 16 transformer blocks features 128 centroids and a 128x128 edge matrix, along with gravitational source routing and token-conditioned target selection. Rather than retrieving values, the memory cell moves from an estimated source to a target memory state. The model has 82.2 million trainable parameters and does not include dense FFN sublayers. You can check out the paper on arXiv with ID 2604.23862.

Key facts

Graph Memory Transformer (GMT) proposed on arXiv
Replaces FFN sublayer with learned memory graph
Decoder-only architecture with causal self-attention
Memory cell with 128 centroids per block
16 transformer blocks in base GMT v7
128x128 edge matrix per block
82.2M trainable parameters
No dense FFN sublayers

Graph Memory Transformer Replaces FFN with Learned Memory Graph

Key facts

Entities

Institutions

Sources