MEMSAD Defense Against Memory Poisoning in LLM Agents

ai-technology · 2026-05-07

Researchers have conceptualized memory poisoning attacks on retrieval-augmented LLM agents within the framework of a Stackelberg game, presenting a comprehensive evaluation system that spans three categories of attacks with increasing access assumptions. Addressing an inconsistency found in Chen et al. (2024), a more accurate evaluation reveals a fourfold increase in attack success rates (ASR-R from 0.25 to 1.00). Their key innovation, MEMSAD (Semantic Anomaly Detection), is a defense mechanism based on calibration and a gradient coupling theorem: when encoder regularity is maintained, the gradients of the anomaly score and retrieval objectives align, ensuring that any continuous perturbation that lowers detection risk also diminishes retrieval rank. This relationship provides a certified detection radius guarantee, emphasizing the need to explore persistent external memory as a security concern for LLM agents.

Key facts

Memory poisoning attacks on retrieval-augmented agents formalized as Stackelberg game
Unified evaluation framework covers three attack classes with escalating access assumptions
Correction of Chen et al. (2024) protocol inconsistency increases ASR-R from 0.25 to 1.00
MEMSAD defense uses gradient coupling theorem linking anomaly score and retrieval gradients
Certified detection radius guarantee provided by the coupling
Persistent external memory security properties previously uncharacterized
Attack success rate (ASR-R) measured under faithful evaluation
Defense ensures any perturbation reducing detection risk harms retrieval rank

MEMSAD Defense Against Memory Poisoning in LLM Agents

Key facts

Entities

Institutions

Sources