Longitudinal Safety Risks in Memory-Equipped LLM Agents

ai-technology · 2026-05-20

A recent study published on arXiv (2605.17830) uncovers a new failure mode termed temporal memory contamination in LLM agents that utilize memory. This research diverges from traditional safety assessments, which typically focus on within-task safety under adversarial scenarios such as prompt injection or memory poisoning. Instead, it investigates how an agent's safety profile evolves as memory builds up over numerous independent tasks over extended periods. The authors propose a trigger-probe protocol to assess a consistent set of probes against read-only memory snapshots at different prefix lengths, along with a NullMemory counterfactual baseline to differentiate memory exposure from stream non-stationarity. The findings indicate that earlier task memories can influence behaviors in later, unrelated tasks, highlighting risks overlooked by single-scenario evaluations.

Key facts

arXiv paper 2605.17830
Memory-equipped LLM agents
Temporal memory contamination failure mode
Trigger-probe protocol
NullMemory counterfactual baseline
Within-task safety vs. cross-task safety
Longitudinal evaluation across tasks
Prompt injection and memory poisoning as adversarial conditions

Longitudinal Safety Risks in Memory-Equipped LLM Agents

Key facts

Entities

Institutions

Sources