Training-Free N-Gram Memory Module Enhances LLM Knowledge Retrieval
Researchers have developed a novel approach called N-gram Memory (NGM) designed for large language models, which operates without the need for supplementary training. This innovative framework features a Causal N-Gram Encoder, which generates N-gram representations by averaging pre-existing token embeddings, eliminating the requirement for additional embeddings or retrieval systems. Additionally, it incorporates a Cosine-Gated Memory Injector that employs a non-parametric cosine gate along with ReLU to facilitate memory injection. This method effectively separates knowledge storage from processing, enhancing efficiency compared to traditional Mixture of Experts models and learned memory embeddings, and can easily integrate with existing models.
Key facts
- NGM is a plug-and-play memory module for LLMs
- It requires no additional training
- Consists of Causal N-Gram Encoder and Cosine-Gated Memory Injector
- Encoder averages pretrained token embeddings to form N-gram representations
- No separate memory table or retrieval pipeline needed
- Cosine gate with ReLU is non-parametric
- Decouples knowledge storage from neural computation
- More efficient than MoE and learned memory embeddings
Entities
—