New Backdoor Attack Targets Unified Autoregressive Models
A novel backdoor attack, named Token by Token Backdoor Attack (ToBAC), has been unveiled by researchers, targeting unified autoregressive models (UAMs) for the first time. These transformer models are capable of producing text and image tokens simultaneously in one autoregressive step, utilizing shared parameters and a multimodal vocabulary. The findings, released on arXiv, reveal that seemingly harmless characters or frequently used words can trigger detrimental actions in autoregressive image generation, affecting both visual and textual outputs. This manipulation enhances the believability of forged content. The attack investigates both data-based and model-based poisoning techniques, taking advantage of the unified structure to spread harmful impacts across various output modalities.
Key facts
- ToBAC is the first backdoor attack targeting unified autoregressive models.
- UAMs generate text and image tokens in a single autoregressive pass.
- The attack uses innocuous characters or common words as triggers.
- It manipulates both visual outputs and accompanying text.
- The study explores data-based and model-based poisoning strategies.
- The unified architecture enables multimodal backdoor attacks.
- The research was published on arXiv with ID 2605.19227.
- The attack increases the perceived authenticity of fabricated content.
Entities
Institutions
- arXiv