CodecAttack: Robust Adversarial Perturbations for Audio LLMs
Researchers have developed CodecAttack, a new adversarial attack method for Audio Large Language Models (Audio LLMs) that remains effective even after codec compression preprocessing. Unlike prior attacks that perturb the audio waveform—which can be detected and removed by codec compression—CodecAttack optimizes perturbations within a neural audio codec's continuous latent space. The attack exploits the codec's own compression channel, which discards waveform perturbations but transmits those crafted in its latent space. To enhance robustness across real-world compression channels, the method applies multi-bitrate straight-through Expectation-over-Transformation (EoT) without modifying the target model. The attack was tested across three realistic Audio LLM deployment scenarios and three target models, demonstrating consistent effectiveness. This work highlights a critical vulnerability in current defenses against adversarial attacks on audio AI systems.
Key facts
- CodecAttack optimizes perturbations in a neural audio codec's continuous latent space.
- Prior attacks on Audio LLMs used waveform-domain perturbations that codec compression can detect and remove.
- The codec's compression channel transmits perturbations crafted in its own latent space.
- Multi-bitrate straight-through Expectation-over-Transformation (EoT) is applied to harden the attack.
- The attack does not modify the target model.
- Tested across three realistic Audio LLM deployment scenarios and three target models.
- The research is published on arXiv with ID 2605.20519.
- The attack demonstrates robustness against codec compression defenses.
Entities
Institutions
- arXiv