Training-Free N-Gram Memory Module Enhances LLM Knowledge Retrieval

ai-technology · 2026-05-20

Researchers have developed a novel approach called N-gram Memory (NGM) designed for large language models, which operates without the need for supplementary training. This innovative framework features a Causal N-Gram Encoder, which generates N-gram representations by averaging pre-existing token embeddings, eliminating the requirement for additional embeddings or retrieval systems. Additionally, it incorporates a Cosine-Gated Memory Injector that employs a non-parametric cosine gate along with ReLU to facilitate memory injection. This method effectively separates knowledge storage from processing, enhancing efficiency compared to traditional Mixture of Experts models and learned memory embeddings, and can easily integrate with existing models.

Key facts

NGM is a plug-and-play memory module for LLMs
It requires no additional training
Consists of Causal N-Gram Encoder and Cosine-Gated Memory Injector
Encoder averages pretrained token embeddings to form N-gram representations
No separate memory table or retrieval pipeline needed
Cosine gate with ReLU is non-parametric
Decouples knowledge storage from neural computation
More efficient than MoE and learned memory embeddings

Entities

—

Sources

arXiv cs.AI — 2026-05-19