Graph Memory Transformer Replaces FFN with Learned Memory Graph
A new study on arXiv has unveiled the Graph Memory Transformer (GMT), which is a decoder-only language model that replaces the standard Feed-Forward Network (FFN) with a specially designed memory graph. The GMT keeps the causal self-attention mechanism but introduces a memory cell that helps navigate token representations using a learned bank of centroids connected by a directed transition matrix. In its base version, GMT v7, each of the 16 transformer blocks features 128 centroids and a 128x128 edge matrix, along with gravitational source routing and token-conditioned target selection. Rather than retrieving values, the memory cell moves from an estimated source to a target memory state. The model has 82.2 million trainable parameters and does not include dense FFN sublayers. You can check out the paper on arXiv with ID 2604.23862.
Key facts
- Graph Memory Transformer (GMT) proposed on arXiv
- Replaces FFN sublayer with learned memory graph
- Decoder-only architecture with causal self-attention
- Memory cell with 128 centroids per block
- 16 transformer blocks in base GMT v7
- 128x128 edge matrix per block
- 82.2M trainable parameters
- No dense FFN sublayers
Entities
Institutions
- arXiv