ARTFEED — Contemporary Art Intelligence

Sparse Token Attack Bypasses Audio Language Model Safety

ai-technology · 2026-05-07

Researchers have developed a sparse jailbreak method for audio language models (ALMs) that achieves high attack success rates while updating only a fraction of the audio waveform. The work, titled 'Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization', was published on arXiv (ID: 2605.04700). The team discovered that gradient energy in ALMs is concentrated on a small subset of token-aligned audio regions. Based on this, they proposed Token-Aware Gradient Optimization (TAGO), which masks low-energy gradients at each iteration, retaining only those aligned with high-energy tokens. Testing on three ALMs, TAGO outperformed baselines. Notably, on Qwen3-Omni, the attack success rate (ASR_l) remained at 86% even with substantial sparsification. This challenges the assumption that dense waveform perturbation is necessary for jailbreaking, revealing a vulnerability in current audio safety mechanisms.

Key facts

  • arXiv paper ID: 2605.04700
  • Jailbreak attacks on ALMs typically update entire waveform densely
  • Gradient energy is highly non-uniform across audio tokens
  • TAGO enables sparse optimization by masking low-energy gradients
  • TAGO tested on three ALMs, outperforms baselines
  • On Qwen3-Omni, ASR_l remains 86% with substantial sparsification
  • Method reveals vulnerability in audio safety mechanisms

Entities

Institutions

  • arXiv

Sources