CoMeT Architecture Enables LLMs to Process Arbitrarily Long Sequences with Constant Memory

ai-technology · 2026-04-20

The Collaborative Memory Transformer (CoMeT) is a new architecture designed to tackle the challenges of quadratic complexity and key-value cache limitations that impede long-context processing in traditional Transformers. This plug-in module enables large language models to manage sequences of any length while maintaining constant memory usage and linear time complexity. CoMeT features a dual-memory approach: a FIFO queue for recent events as temporary memory and a global memory with a gated update mechanism for long-range dependencies. These memories act as a dynamic soft prompt for future data segments. It can be seamlessly integrated into pre-trained models with little fine-tuning. A novel layer-level pipeline parallelism strategy has also been introduced to enhance fine-tuning for extremely long contexts, showcasing impressive effectiveness as outlined in the arXiv preprint 2602.01766v2, which was announced as a replacement cross.

Key facts

CoMeT enables LLMs to handle arbitrarily long sequences with constant memory usage
It operates with linear time complexity
The architecture uses a dual-memory system: temporary FIFO queue and global gated memory
Memories act as dynamic soft prompts for subsequent data chunks
CoMeT can be integrated into pre-trained models with minimal fine-tuning
A novel layer-level pipeline parallelism strategy enables efficient fine-tuning on long contexts
The approach addresses quadratic complexity and growing KV cache issues in standard Transformers
Details are published in arXiv preprint 2602.01766v2

Entities

—

Sources

arXiv cs.AI — 2026-04-20