ARTFEED — Contemporary Art Intelligence

Memory Laundering: Hidden Toxicity in LLM Agent Memory

ai-technology · 2026-05-20

A new study from arXiv (2605.16746) identifies a failure mode in memory-augmented LLM agents called 'memory laundering,' where toxic or adversarial context is compressed into memory summaries that evade standard toxicity detectors while preserving hostile framing. Using paired counterfactual multi-agent rollouts, researchers show that such summaries remain below common thresholds yet increase downstream toxicity relative to neutral baselines. They introduce the sub-threshold propagation gap (SPG) metric to quantify this hidden influence. The work highlights that safety in persistent-state agents depends not only on outputs but on stored and reused memory.

Key facts

  • arXiv paper 2605.16746 studies memory laundering in LLM agents
  • Toxic context can be compressed into memory summaries that evade detectors
  • Memory summaries below toxicity thresholds still increase downstream toxicity
  • Sub-threshold propagation gap (SPG) measures hidden influence
  • Safety depends on what agents store and reuse, not just outputs

Entities

Institutions

  • arXiv

Sources