Absorber LLM: Causal Synchronization for Efficient Long-Context Inference
Researchers propose Absorber LLM, a method that formulates long-context retention as self-supervised causal synchronization. The approach absorbs historical contexts into model parameters, enabling a contextless model to match the original model's future generations. This addresses the high computational cost of self-attention in transformers and overcomes limitations of constant-memory alternatives like RNNs and SSMs, which lose long-tail dependencies, and Test-Time Training (TTT), which overfits token-level projection. Experiments on long-context tasks demonstrate effectiveness. The paper is available on arXiv.
Key facts
- Absorber LLM uses causal synchronization for test-time training.
- It addresses high memory consumption of self-attention in transformers.
- Constant-memory alternatives like RNNs and SSMs lose long-tail dependencies.
- TTT methods overfit token-level projection and fail to preserve causal effect.
- The method absorbs historical contexts into parameters.
- A contextless model matches the original model with full context on future generations.
- Experiments show effectiveness on long-context tasks.
- The paper is on arXiv with ID 2604.20915.
Entities
Institutions
- arXiv