Domain-Camouflaged Injection Attacks Evade LLM Detection Systems
A recent study published on arXiv indicates that injection detection systems for LLM agents struggle when payloads replicate the specific vocabulary and authority frameworks of a domain, a phenomenon referred to as domain-camouflaged injection. Detection efficacy plummeted from 93.8% to 9.7% for Llama 3.1 8B and from 100% to 55.6% for Gemini 2.0 Flash. The Camouflage Detection Gap (CDG) showed significant statistical relevance across 45 tasks, spanning three domains and two model families. Notably, Llama Guard 3, a classifier designed for production safety, failed to identify any camouflaged injections.
Key facts
- arXiv paper 2605.22001 identifies domain-camouflaged injection attacks.
- Detection rates fell from 93.8% to 9.7% on Llama 3.1 8B.
- Detection rates fell from 100% to 55.6% on Gemini 2.0 Flash.
- Camouflage Detection Gap (CDG) formalized as the difference in detection rates.
- CDG was statistically significant (chi^2 = 38.03 for Llama, chi^2 = 17.05 for Gemini).
- Zero reverse discordant pairs were observed.
- Llama Guard 3 detected zero camouflaged injections.
- Study covered 45 tasks across three domains and two model families.
Entities
Institutions
- arXiv
- Llama 3.1
- Gemini 2.0 Flash
- Llama Guard 3