Sleeper Memory Poisoning Attack on LLM Assistants
A novel security flaw has been discovered in large language models (LLMs) that utilize persistent memory, enabling assistants to retain user-specific data over multiple sessions. This vulnerability, referred to as 'sleeper memory poisoning,' involves an attacker manipulating external resources—like documents, web pages, or repositories—to induce the assistant to save false memories about the user. Unlike traditional prompt injection attacks, this method can lie dormant and resurface in future conversations. The research assessed the entire attack process, verifying if contaminated memories could be stored, retrieved, and subsequently influence later exchanges. In stateful LLM assistants, the addition of poisoned memories reached 99.8% for GPT-5.5 and 95% for Kimi-K2.6, underscoring a significant security threat as LLMs increasingly adopt persistent memory for enhanced personalization.
Key facts
- Sleeper memory poisoning is a delayed attack on LLMs with persistent memory.
- Attack manipulates external context to store fabricated memories about users.
- Unlike prompt injection, the attack can remain dormant across conversations.
- Evaluated on GPT-5.5 and Kimi-K2.6 assistants.
- Poisoned memories added up to 99.8% on GPT-5.5.
- Poisoned memories added up to 95% on Kimi-K2.6.
- Attack pipeline includes writing, retrieval, and steering of conversations.
- Vulnerability arises from stateful memory for personalization.
Entities
Institutions
- arXiv