DeferMem: Reinforcement Learning for Long-Term Memory QA

ai-technology · 2026-05-23

DeferMem is a long-term memory framework for LLM agents that decouples memory processing into high-recall candidate retrieval and query-conditioned evidence distillation. It uses a lightweight segment-link structure to organize raw conversational history and retrieve broad candidates at query time. A memory distiller trained with DistillPO, a reinforcement learning algorithm, distills high-recall but noisy candidates into query-specific evidence. This approach addresses the challenge of scattered evidence across long histories and irrelevant content, improving answer accuracy without pre-processing memory before queries are known.

Key facts

DeferMem is a long-term memory framework for LLM agents.
It decouples memory into high-recall candidate retrieval and query-conditioned evidence distillation.
Uses a lightweight segment-link structure to organize raw history.
Retrieves broad candidates at query time.
Applies a memory distiller trained with DistillPO reinforcement learning algorithm.
DistillPO distills high-recall but noisy candidates into query-specific evidence.
Addresses scattered evidence across long conversational histories.
Improves answer accuracy without pre-processing memory before queries.

Entities

—

Sources

arXiv cs.AI — 2026-05-23