DMN Framework Jailbreaks Multimodal LLMs with Multi-Image Inputs

ai-technology · 2026-05-20

Researchers propose DMN, a compositional jailbreak framework targeting multimodal large language models (MLLMs) that accept multi-image inputs. Unlike prior single-image methods, DMN distributes harmful instructions across multiple images, uses multimodal evidence, and introduces a number chain task to distract the model. Experiments show attack success rates over 90% on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4. The paper highlights vulnerabilities from insufficient multi-image safety alignment.

Key facts

DMN stands for Distributed instruction, Multimodal evidence, and Number chain task.
Achieves over 90% attack success rate on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4.
Exploits multi-image inputs to bypass safety alignment.
Previous methods only used single images, limiting attack space.
Published on arXiv with ID 2605.18915.

DMN Framework Jailbreaks Multimodal LLMs with Multi-Image Inputs

Key facts

Entities

Institutions

Sources