Memory Laundering: Hidden Toxicity in LLM Agent Memory
A new study from arXiv (2605.16746) identifies a failure mode in memory-augmented LLM agents called 'memory laundering,' where toxic or adversarial context is compressed into memory summaries that evade standard toxicity detectors while preserving hostile framing. Using paired counterfactual multi-agent rollouts, researchers show that such summaries remain below common thresholds yet increase downstream toxicity relative to neutral baselines. They introduce the sub-threshold propagation gap (SPG) metric to quantify this hidden influence. The work highlights that safety in persistent-state agents depends not only on outputs but on stored and reused memory.
Key facts
- arXiv paper 2605.16746 studies memory laundering in LLM agents
- Toxic context can be compressed into memory summaries that evade detectors
- Memory summaries below toxicity thresholds still increase downstream toxicity
- Sub-threshold propagation gap (SPG) measures hidden influence
- Safety depends on what agents store and reuse, not just outputs
Entities
Institutions
- arXiv