MemQ: Q-Learning for Self-Evolving Memory Agents on Provenance DAGs

other · 2026-05-12

MemQ has introduced a groundbreaking technique leveraging TD(λ) eligibility traces to enhance memory Q-values in large language model agents. This approach enables credit to flow backward through a provenance directed acyclic graph (DAG), in contrast to prior methods that treated memories in isolation. MemQ’s system emphasizes dependency chains by modifying credit weight based on (γλ)^d, prioritizing structural closeness over time. The framework is termed Exogenous-Context Markov Decision Process (MDP), distinguishing between external tasks and internal memory. MemQ demonstrated superior performance, achieving the highest success rates across six evaluation metrics, including interactions with operating systems and expert-level question answering.

Key facts

MemQ applies TD(λ) eligibility traces to memory Q-values
Credit propagates backward through a provenance DAG
Credit weight decays as (γλ)^d with DAG depth d
Formalized as an Exogenous-Context MDP
Tested on six benchmarks: OS interaction, function calling, code generation, multimodal reasoning, embodied reasoning, expert-level QA
Achieves highest success rate across all benchmarks

Entities

—

Sources

arXiv cs.AI — 2026-05-12