MELT: Memory-Efficient Looped Transformer for Recurrent LLMs

ai-technology · 2026-05-11

arXiv paper 2605.07721 introduces MELT (Memory-Efficient Looped Transformer), a novel architecture for recurrent large language models that decouples reasoning depth from memory consumption. Unlike models such as Ouro, which accumulate a standard Key-Value (KV) cache across iterations causing linear memory growth with reasoning depth, MELT maintains a single KV cache per layer shared across reasoning loops. This cache is updated via a learnable gating mechanism, enabling stable and efficient multi-step computation without prohibitive memory usage. The approach addresses a key scalability limitation of recurrent LLMs, allowing deeper reasoning without proportional memory costs.

Key facts

MELT decouples compute from memory in looped language models.
Standard recurrent LLMs like Ouro have memory consumption linear with reasoning depth.
MELT uses a single KV cache per layer shared across reasoning loops.
The KV cache is updated via a learnable gating mechanism.
The architecture enables stable and efficient multi-step computation.
The paper is on arXiv with ID 2605.07721.
The approach improves practical scalability of recurrent LLMs.
MELT allows increasing reasoning iterations without prohibitive memory growth.

MELT: Memory-Efficient Looped Transformer for Recurrent LLMs

Key facts

Entities

Institutions

Sources