Study Reveals Vulnerabilities in AI Tool-Calling Agents

ai-technology · 2026-06-01

A recent investigation published on arXiv (2605.30686) delves into indirect prompt injection attacks targeting ReAct agents, which integrate chain-of-thought reasoning with tool usage. These agents, utilized for tasks like scheduling, data retrieval, and access, exhibit vulnerabilities when an attacker manipulates a tool's output to insert harmful commands. The study examines three less-explored risk factors: injection depth (the position of the payload in the tool sequence), payload framing (the rhetorical style), and turn-budget sensitivity (the permitted number of turns). Conducting four controlled experiments across 20 scenarios within five attack categories, the research involved 460 trials against GPT-4o-mini and Claude Haiku, costing less than $0.36 in total. Findings from Study 1 indicate that the attack success rate (ASR) for GPT-4o-mini drops from 60% at shallow injection depths to lower levels at deeper ones, underscoring significant security vulnerabilities in existing agent implementations.

Key facts

Study examines indirect prompt injection in ReAct agents
20 scenarios across five attack categories tested
460 trials conducted against GPT-4o-mini and Claude Haiku
Combined API cost under 0.36 USD
Attack success rate decays from 60% with injection depth
Three risk dimensions explored: depth, framing, turn-budget
Agents used for scheduling, file retrieval, data access
Published on arXiv with ID 2605.30686

Study Reveals Vulnerabilities in AI Tool-Calling Agents

Key facts

Entities

Institutions

Sources