Comic-Based Jailbreaks Threaten Multimodal AI Safety
A recent study published on arXiv indicates that comic-style visual narratives can effectively circumvent safety measures in multimodal large language models (MLLMs). The researchers introduced ComicJailbreak, which consists of 1,167 attack scenarios categorized into 10 harm types and 5 task configurations, integrating harmful objectives within basic three-panel comics. When evaluating 15 advanced MLLMs (comprising 6 commercial and 9 open-source), the comic-based assaults demonstrated success rates akin to robust rule-based jailbreaks, with ensemble success rates surpassing 90% for several commercial models. Although current defense strategies were effective against these harmful comics, they resulted in performance compromises. This study underscores a new safety vulnerability in MLLMs when faced with visually grounded commands.
Key facts
- ComicJailbreak benchmark includes 1,167 attack instances
- Covers 10 harm categories and 5 task setups
- Tested on 15 state-of-the-art MLLMs (6 commercial, 9 open-source)
- Ensemble success rates exceeded 90% on several commercial models
- Comic-based attacks match strong rule-based jailbreaks
- Outperform plain-text and random-image baselines
- Existing defenses effective but induce trade-offs
- Study published on arXiv (2603.21697)
Entities
Institutions
- arXiv
- JailbreakBench
- JailbreakV