Memory Laundering: Hidden Toxicity in LLM Agent Memory

ai-technology · 2026-05-20

A new study from arXiv (2605.16746) identifies a failure mode in memory-augmented LLM agents called 'memory laundering,' where toxic or adversarial context is compressed into memory summaries that evade standard toxicity detectors while preserving hostile framing. Using paired counterfactual multi-agent rollouts, researchers show that such summaries remain below common thresholds yet increase downstream toxicity relative to neutral baselines. They introduce the sub-threshold propagation gap (SPG) metric to quantify this hidden influence. The work highlights that safety in persistent-state agents depends not only on outputs but on stored and reused memory.

Key facts

arXiv paper 2605.16746 studies memory laundering in LLM agents
Toxic context can be compressed into memory summaries that evade detectors
Memory summaries below toxicity thresholds still increase downstream toxicity
Sub-threshold propagation gap (SPG) measures hidden influence
Safety depends on what agents store and reuse, not just outputs

Memory Laundering: Hidden Toxicity in LLM Agent Memory

Key facts

Entities

Institutions

Sources