LLM-Controlled Robots Vulnerable to Semantic Denial-of-Service Attacks

ai-technology · 2026-04-30

A recent study indicates that robots controlled by large language models (LLMs) exhibit a vulnerability due to their adherence to safety protocols. An attacker can exploit this by inserting brief safety-sounding phrases (1-5 tokens) into the robot's audio input, causing the model's safety logic to interrupt or cease operations without needing to bypass the model's safeguards. This type of semantic denial-of-service attack leads the robot to halt, as the injected phrases resemble valid warnings. The investigation assessed four vision-language models, seven prompt-level defenses, and three deployment configurations, including both single and multiple injections. Results reveal that prompt-only defenses compromise between mitigating attacks and responding to actual dangers. While the most effective defenses lessen hard-stop attack success in some models, they merely transform the disruption into acknowledge loops and false alerts.

Key facts

Safety-oriented instruction-following in LLM-controlled robots creates an availability attack surface.
Short safety-plausible phrases (1-5 tokens) injected into a robot's audio channel can trigger safety reasoning to halt or disrupt execution.
The attack does not require jailbreaking the model or overriding its policy.
The attack is a semantic denial-of-service: the agent stops because the injected signal looks like a legitimate alert.
Four vision-language models were tested.
Seven prompt-level defenses were evaluated.
Three deployment modes were considered.
Prompt-only defenses trade off attack suppression against genuine hazard response.

Entities

—

Sources

arXiv cs.AI — 2026-04-29