Sparse Token Attack Bypasses Audio Language Model Safety

ai-technology · 2026-05-07

Researchers have developed a sparse jailbreak method for audio language models (ALMs) that achieves high attack success rates while updating only a fraction of the audio waveform. The work, titled 'Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization', was published on arXiv (ID: 2605.04700). The team discovered that gradient energy in ALMs is concentrated on a small subset of token-aligned audio regions. Based on this, they proposed Token-Aware Gradient Optimization (TAGO), which masks low-energy gradients at each iteration, retaining only those aligned with high-energy tokens. Testing on three ALMs, TAGO outperformed baselines. Notably, on Qwen3-Omni, the attack success rate (ASR_l) remained at 86% even with substantial sparsification. This challenges the assumption that dense waveform perturbation is necessary for jailbreaking, revealing a vulnerability in current audio safety mechanisms.

Key facts

arXiv paper ID: 2605.04700
Jailbreak attacks on ALMs typically update entire waveform densely
Gradient energy is highly non-uniform across audio tokens
TAGO enables sparse optimization by masking low-energy gradients
TAGO tested on three ALMs, outperforms baselines
On Qwen3-Omni, ASR_l remains 86% with substantial sparsification
Method reveals vulnerability in audio safety mechanisms

Sparse Token Attack Bypasses Audio Language Model Safety

Key facts

Entities

Institutions

Sources