Research Reveals Conjunctive Prompt Attacks in Multi-Agent LLM Systems
A new research paper identifies vulnerabilities in multi-agent LLM systems through conjunctive prompt attacks. These attacks exploit routing mechanisms where a trigger key in user queries combines with hidden adversarial templates in compromised remote agents. Unlike single-agent safety studies, this approach targets systems where multiple agents interact. Attackers manipulate only trigger placement and template insertion without altering model weights or client agents. The research demonstrates that routing-aware optimization significantly increases attack success across star, chain, and DAG topologies while maintaining low false activation rates. Current defenses including PromptGuard, Llama-Guard variants, and system-level controls like tool restrictions fail to reliably prevent these attacks. The study highlights overlooked security surfaces created by prompt segmentation and inter-agent routing in real-world applications. Published on arXiv as 2604.16543v1, this cross-announcement research addresses gaps in existing LLM safety evaluations.
Key facts
- Conjunctive prompt attacks exploit multi-agent LLM systems
- Attacks combine trigger keys in user queries with hidden adversarial templates in compromised agents
- Attackers control only trigger placement and template insertion without changing model weights
- Routing-aware optimization increases attack success across star, chain, and DAG topologies
- Existing defenses including PromptGuard and Llama-Guard variants fail to stop attacks
- Research published on arXiv as 2604.16543v1 with cross-announcement type
- Study focuses on systems where multiple agents interact rather than single-agent models
- Attack surfaces created by prompt segmentation and inter-agent routing
Entities
Institutions
- arXiv