Longitudinal Safety Risks in Memory-Equipped LLM Agents
A recent study published on arXiv (2605.17830) uncovers a new failure mode termed temporal memory contamination in LLM agents that utilize memory. This research diverges from traditional safety assessments, which typically focus on within-task safety under adversarial scenarios such as prompt injection or memory poisoning. Instead, it investigates how an agent's safety profile evolves as memory builds up over numerous independent tasks over extended periods. The authors propose a trigger-probe protocol to assess a consistent set of probes against read-only memory snapshots at different prefix lengths, along with a NullMemory counterfactual baseline to differentiate memory exposure from stream non-stationarity. The findings indicate that earlier task memories can influence behaviors in later, unrelated tasks, highlighting risks overlooked by single-scenario evaluations.
Key facts
- arXiv paper 2605.17830
- Memory-equipped LLM agents
- Temporal memory contamination failure mode
- Trigger-probe protocol
- NullMemory counterfactual baseline
- Within-task safety vs. cross-task safety
- Longitudinal evaluation across tasks
- Prompt injection and memory poisoning as adversarial conditions
Entities
Institutions
- arXiv