ARTFEED — Contemporary Art Intelligence

MEMSAD Defense Against Memory Poisoning in LLM Agents

ai-technology · 2026-05-07

Researchers have conceptualized memory poisoning attacks on retrieval-augmented LLM agents within the framework of a Stackelberg game, presenting a comprehensive evaluation system that spans three categories of attacks with increasing access assumptions. Addressing an inconsistency found in Chen et al. (2024), a more accurate evaluation reveals a fourfold increase in attack success rates (ASR-R from 0.25 to 1.00). Their key innovation, MEMSAD (Semantic Anomaly Detection), is a defense mechanism based on calibration and a gradient coupling theorem: when encoder regularity is maintained, the gradients of the anomaly score and retrieval objectives align, ensuring that any continuous perturbation that lowers detection risk also diminishes retrieval rank. This relationship provides a certified detection radius guarantee, emphasizing the need to explore persistent external memory as a security concern for LLM agents.

Key facts

  • Memory poisoning attacks on retrieval-augmented agents formalized as Stackelberg game
  • Unified evaluation framework covers three attack classes with escalating access assumptions
  • Correction of Chen et al. (2024) protocol inconsistency increases ASR-R from 0.25 to 1.00
  • MEMSAD defense uses gradient coupling theorem linking anomaly score and retrieval gradients
  • Certified detection radius guarantee provided by the coupling
  • Persistent external memory security properties previously uncharacterized
  • Attack success rate (ASR-R) measured under faithful evaluation
  • Defense ensures any perturbation reducing detection risk harms retrieval rank

Entities

Institutions

  • arXiv

Sources