MELT: Memory-Efficient Looped Transformer for Recurrent LLMs
arXiv paper 2605.07721 introduces MELT (Memory-Efficient Looped Transformer), a novel architecture for recurrent large language models that decouples reasoning depth from memory consumption. Unlike models such as Ouro, which accumulate a standard Key-Value (KV) cache across iterations causing linear memory growth with reasoning depth, MELT maintains a single KV cache per layer shared across reasoning loops. This cache is updated via a learnable gating mechanism, enabling stable and efficient multi-step computation without prohibitive memory usage. The approach addresses a key scalability limitation of recurrent LLMs, allowing deeper reasoning without proportional memory costs.
Key facts
- MELT decouples compute from memory in looped language models.
- Standard recurrent LLMs like Ouro have memory consumption linear with reasoning depth.
- MELT uses a single KV cache per layer shared across reasoning loops.
- The KV cache is updated via a learnable gating mechanism.
- The architecture enables stable and efficient multi-step computation.
- The paper is on arXiv with ID 2605.07721.
- The approach improves practical scalability of recurrent LLMs.
- MELT allows increasing reasoning iterations without prohibitive memory growth.
Entities
Institutions
- arXiv