CoMeT Architecture Enables LLMs to Process Arbitrarily Long Sequences with Constant Memory
The Collaborative Memory Transformer (CoMeT) is a new architecture designed to tackle the challenges of quadratic complexity and key-value cache limitations that impede long-context processing in traditional Transformers. This plug-in module enables large language models to manage sequences of any length while maintaining constant memory usage and linear time complexity. CoMeT features a dual-memory approach: a FIFO queue for recent events as temporary memory and a global memory with a gated update mechanism for long-range dependencies. These memories act as a dynamic soft prompt for future data segments. It can be seamlessly integrated into pre-trained models with little fine-tuning. A novel layer-level pipeline parallelism strategy has also been introduced to enhance fine-tuning for extremely long contexts, showcasing impressive effectiveness as outlined in the arXiv preprint 2602.01766v2, which was announced as a replacement cross.
Key facts
- CoMeT enables LLMs to handle arbitrarily long sequences with constant memory usage
- It operates with linear time complexity
- The architecture uses a dual-memory system: temporary FIFO queue and global gated memory
- Memories act as dynamic soft prompts for subsequent data chunks
- CoMeT can be integrated into pre-trained models with minimal fine-tuning
- A novel layer-level pipeline parallelism strategy enables efficient fine-tuning on long contexts
- The approach addresses quadratic complexity and growing KV cache issues in standard Transformers
- Details are published in arXiv preprint 2602.01766v2
Entities
—