MemQ: Q-Learning for Self-Evolving Memory Agents on Provenance DAGs
MemQ has introduced a groundbreaking technique leveraging TD(λ) eligibility traces to enhance memory Q-values in large language model agents. This approach enables credit to flow backward through a provenance directed acyclic graph (DAG), in contrast to prior methods that treated memories in isolation. MemQ’s system emphasizes dependency chains by modifying credit weight based on (γλ)^d, prioritizing structural closeness over time. The framework is termed Exogenous-Context Markov Decision Process (MDP), distinguishing between external tasks and internal memory. MemQ demonstrated superior performance, achieving the highest success rates across six evaluation metrics, including interactions with operating systems and expert-level question answering.
Key facts
- MemQ applies TD(λ) eligibility traces to memory Q-values
- Credit propagates backward through a provenance DAG
- Credit weight decays as (γλ)^d with DAG depth d
- Formalized as an Exogenous-Context MDP
- Tested on six benchmarks: OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, expert-level QA
- Achieves highest success rate across all benchmarks
Entities
—