MemAudit: Post-hoc Framework Detects Poisoned Agent Memory via Causal Attribution

ai-technology · 2026-05-25

Researchers propose MemAudit, a post-hoc causal memory auditing framework for large language model (LLM) agents that use persistent memory. The framework addresses a security vulnerability where adversarial users can inject malicious records into agent memory through ordinary interactions, which later steer reasoning and actions. Existing defenses focus on online intervention (e.g., prompt filtering, output blocking) but do not identify which stored memories caused harmful behavior after the fact. MemAudit combines two signals: a counterfactual memory influence score measuring each memory's causal contribution to harmful outputs, and structural anomaly detection. The paper is published on arXiv (2605.23723).

Key facts

MemAudit is a post-hoc causal memory auditing framework for memory-augmented LLM agents.
It addresses security vulnerability from adversarial memory injection.
Combines counterfactual memory influence score and structural anomaly detection.
Existing defenses only offer online intervention, not post-hoc attribution.
Paper published on arXiv with ID 2605.23723.

MemAudit: Post-hoc Framework Detects Poisoned Agent Memory via Causal Attribution

Key facts

Entities

Institutions

Sources