LLM Agents Face New Prompt Injection Surface in Tool Descriptions
Research indicates that LLM agents enhanced by tools are susceptible to prompt injection not just through outputs from these tools but also via their descriptions, which the agent examines prior to tool activation. The researchers maintained the injection payload as byte-identical and tested it across both channels on 13 LLMs from six different families and four task suites. Findings reveal an inversion in vulnerability patterns among models: GPT-4.1 exhibits a 96% vulnerability rate on tool outputs but just 4% on tool descriptions, whereas Gemini 3 Flash shows 20% and 98%, respectively. A variance analysis of 6,830 trials shows that 0% of the variation is due to the injection surface, suggesting that model architecture influences vulnerability. The study, titled "The Surface You Test Is Not the Surface That Breaks," is available on arXiv under ID 2605.30454.
Key facts
- Tool-augmented LLM agents are vulnerable to prompt injection via tool descriptions.
- Attackers can choose between tool output and tool description surfaces.
- Payload was byte-identical across both surfaces.
- 13 LLMs from six families and four task suites were tested.
- GPT-4.1 is 96% vulnerable on tool outputs, 4% on tool descriptions.
- Gemini 3 Flash is 20% vulnerable on tool outputs, 98% on tool descriptions.
- Variance decomposition over 6,830 attempts shows 0% variation due to surface.
- Paper published on arXiv with ID 2605.30454.
Entities
Institutions
- arXiv