ARTFEED — Contemporary Art Intelligence

Graph Memory Transformer Replaces FFN with Learned Memory Graph

ai-technology · 2026-04-29

A new study on arXiv has unveiled the Graph Memory Transformer (GMT), which is a decoder-only language model that replaces the standard Feed-Forward Network (FFN) with a specially designed memory graph. The GMT keeps the causal self-attention mechanism but introduces a memory cell that helps navigate token representations using a learned bank of centroids connected by a directed transition matrix. In its base version, GMT v7, each of the 16 transformer blocks features 128 centroids and a 128x128 edge matrix, along with gravitational source routing and token-conditioned target selection. Rather than retrieving values, the memory cell moves from an estimated source to a target memory state. The model has 82.2 million trainable parameters and does not include dense FFN sublayers. You can check out the paper on arXiv with ID 2604.23862.

Key facts

  • Graph Memory Transformer (GMT) proposed on arXiv
  • Replaces FFN sublayer with learned memory graph
  • Decoder-only architecture with causal self-attention
  • Memory cell with 128 centroids per block
  • 16 transformer blocks in base GMT v7
  • 128x128 edge matrix per block
  • 82.2M trainable parameters
  • No dense FFN sublayers

Entities

Institutions

  • arXiv

Sources