Study Reveals Vulnerabilities in AI Tool-Calling Agents
A recent investigation published on arXiv (2605.30686) delves into indirect prompt injection attacks targeting ReAct agents, which integrate chain-of-thought reasoning with tool usage. These agents, utilized for tasks like scheduling, data retrieval, and access, exhibit vulnerabilities when an attacker manipulates a tool's output to insert harmful commands. The study examines three less-explored risk factors: injection depth (the position of the payload in the tool sequence), payload framing (the rhetorical style), and turn-budget sensitivity (the permitted number of turns). Conducting four controlled experiments across 20 scenarios within five attack categories, the research involved 460 trials against GPT-4o-mini and Claude Haiku, costing less than $0.36 in total. Findings from Study 1 indicate that the attack success rate (ASR) for GPT-4o-mini drops from 60% at shallow injection depths to lower levels at deeper ones, underscoring significant security vulnerabilities in existing agent implementations.
Key facts
- Study examines indirect prompt injection in ReAct agents
- 20 scenarios across five attack categories tested
- 460 trials conducted against GPT-4o-mini and Claude Haiku
- Combined API cost under 0.36 USD
- Attack success rate decays from 60% with injection depth
- Three risk dimensions explored: depth, framing, turn-budget
- Agents used for scheduling, file retrieval, data access
- Published on arXiv with ID 2605.30686
Entities
Institutions
- arXiv