LLM-Controlled Robots Vulnerable to Semantic Denial-of-Service Attacks
A recent study indicates that robots controlled by large language models (LLMs) exhibit a vulnerability due to their adherence to safety protocols. An attacker can exploit this by inserting brief safety-sounding phrases (1-5 tokens) into the robot's audio input, causing the model's safety logic to interrupt or cease operations without needing to bypass the model's safeguards. This type of semantic denial-of-service attack leads the robot to halt, as the injected phrases resemble valid warnings. The investigation assessed four vision-language models, seven prompt-level defenses, and three deployment configurations, including both single and multiple injections. Results reveal that prompt-only defenses compromise between mitigating attacks and responding to actual dangers. While the most effective defenses lessen hard-stop attack success in some models, they merely transform the disruption into acknowledge loops and false alerts.
Key facts
- Safety-oriented instruction-following in LLM-controlled robots creates an availability attack surface.
- Short safety-plausible phrases (1-5 tokens) injected into a robot's audio channel can trigger safety reasoning to halt or disrupt execution.
- The attack does not require jailbreaking the model or overriding its policy.
- The attack is a semantic denial-of-service: the agent stops because the injected signal looks like a legitimate alert.
- Four vision-language models were tested.
- Seven prompt-level defenses were evaluated.
- Three deployment modes were considered.
- Prompt-only defenses trade off attack suppression against genuine hazard response.
Entities
—