Domain-Camouflaged Injection Attacks Evade LLM Detection Systems

ai-technology · 2026-05-23

A recent study published on arXiv indicates that injection detection systems for LLM agents struggle when payloads replicate the specific vocabulary and authority frameworks of a domain, a phenomenon referred to as domain-camouflaged injection. Detection efficacy plummeted from 93.8% to 9.7% for Llama 3.1 8B and from 100% to 55.6% for Gemini 2.0 Flash. The Camouflage Detection Gap (CDG) showed significant statistical relevance across 45 tasks, spanning three domains and two model families. Notably, Llama Guard 3, a classifier designed for production safety, failed to identify any camouflaged injections.

Key facts

arXiv paper 2605.22001 identifies domain-camouflaged injection attacks.
Detection rates fell from 93.8% to 9.7% on Llama 3.1 8B.
Detection rates fell from 100% to 55.6% on Gemini 2.0 Flash.
Camouflage Detection Gap (CDG) formalized as the difference in detection rates.
CDG was statistically significant (chi^2 = 38.03 for Llama, chi^2 = 17.05 for Gemini).
Zero reverse discordant pairs were observed.
Llama Guard 3 detected zero camouflaged injections.
Study covered 45 tasks across three domains and two model families.

Domain-Camouflaged Injection Attacks Evade LLM Detection Systems

Key facts

Entities

Institutions

Sources