Research Reveals Conjunctive Prompt Attacks in Multi-Agent LLM Systems

ai-technology · 2026-04-22

A new research paper identifies vulnerabilities in multi-agent LLM systems through conjunctive prompt attacks. These attacks exploit routing mechanisms where a trigger key in user queries combines with hidden adversarial templates in compromised remote agents. Unlike single-agent safety studies, this approach targets systems where multiple agents interact. Attackers manipulate only trigger placement and template insertion without altering model weights or client agents. The research demonstrates that routing-aware optimization significantly increases attack success across star, chain, and DAG topologies while maintaining low false activation rates. Current defenses including PromptGuard, Llama-Guard variants, and system-level controls like tool restrictions fail to reliably prevent these attacks. The study highlights overlooked security surfaces created by prompt segmentation and inter-agent routing in real-world applications. Published on arXiv as 2604.16543v1, this cross-announcement research addresses gaps in existing LLM safety evaluations.

Key facts

Conjunctive prompt attacks exploit multi-agent LLM systems
Attacks combine trigger keys in user queries with hidden adversarial templates in compromised agents
Attackers control only trigger placement and template insertion without changing model weights
Routing-aware optimization increases attack success across star, chain, and DAG topologies
Existing defenses including PromptGuard and Llama-Guard variants fail to stop attacks
Research published on arXiv as 2604.16543v1 with cross-announcement type
Study focuses on systems where multiple agents interact rather than single-agent models
Attack surfaces created by prompt segmentation and inter-agent routing

Research Reveals Conjunctive Prompt Attacks in Multi-Agent LLM Systems

Key facts

Entities

Institutions

Sources