DeferMem: Reinforcement Learning for Long-Term Memory QA
DeferMem is a long-term memory framework for LLM agents that decouples memory processing into high-recall candidate retrieval and query-conditioned evidence distillation. It uses a lightweight segment-link structure to organize raw conversational history and retrieve broad candidates at query time. A memory distiller trained with DistillPO, a reinforcement learning algorithm, distills high-recall but noisy candidates into query-specific evidence. This approach addresses the challenge of scattered evidence across long histories and irrelevant content, improving answer accuracy without pre-processing memory before queries are known.
Key facts
- DeferMem is a long-term memory framework for LLM agents.
- It decouples memory into high-recall candidate retrieval and query-conditioned evidence distillation.
- Uses a lightweight segment-link structure to organize raw history.
- Retrieves broad candidates at query time.
- Applies a memory distiller trained with DistillPO reinforcement learning algorithm.
- DistillPO distills high-recall but noisy candidates into query-specific evidence.
- Addresses scattered evidence across long conversational histories.
- Improves answer accuracy without pre-processing memory before queries.
Entities
—