Termination Poisoning Attacks Exploit LLM Agent Loops
A new arXiv paper (2605.05846) identifies a critical vulnerability in LLM agents that operate in iterative execution loops. Researchers define 'Termination Poisoning' as an attack where malicious prompts distort an agent's self-evaluation, causing it to believe a task is incomplete and leading to unbounded computation. The study designs 10 representative attack strategies and tests them across 8 LLM agents and 60 tasks. Results show distinct behavioral signatures in different agents that determine attack success, offering transferable patterns for crafting attacks against unseen agents. The work highlights a systemic risk in autonomous agent architectures.
Key facts
- arXiv paper 2605.05846 defines Termination Poisoning attacks on LLM agents
- Attacks exploit iterative execution loops where agents reason, act, and self-evaluate
- Malicious prompts can distort termination judgment, causing unbounded computation
- 10 representative attack strategies were designed
- Empirical study covered 8 LLM agents and 60 tasks
- Different agents exhibit distinct behavioral signatures affecting attack success
- Transferable patterns can guide attacks on unseen agents
- The vulnerability is inherent to self-directed loop architectures
Entities
Institutions
- arXiv