MemRouter: Embedding-Based Memory Management for Long-Term Conversational Agents
A new system called MemRouter decouples memory admission from autoregressive LLM generation for long-term conversational agents. It uses an embedding-based routing policy to decide which conversation turns to store in external memory, replacing per-turn memory-management decoding. MemRouter encodes each turn with recent context, projects embeddings through a frozen LLM backbone, and predicts storage using lightweight classification heads, training only 12 million parameters. In a controlled comparison on the LoCoMo benchmark, with identical retrieval pipeline, answer prompts, and QA backbone (Qwen2.5-7B), MemRouter outperformed an LLM-based memory manager on every question category, achieving an overall F1 score of 52.0 versus 45.6, with non-overlapping 95% confidence intervals. It also reduced memory-management p50 latency.
Key facts
- MemRouter is a write-side memory router for long-term conversational agents.
- It decouples memory admission from autoregressive LLM generation.
- It uses an embedding-based routing policy instead of per-turn decoding.
- Only 12 million parameters are trained.
- Tested on the LoCoMo benchmark with Qwen2.5-7B as the QA backbone.
- Overall F1 score: 52.0 vs 45.6 for LLM-based manager.
- Non-overlapping 95% confidence intervals.
- Reduced memory-management p50 latency.
Entities
Institutions
- arXiv