ARTFEED — Contemporary Art Intelligence

DMN Framework Jailbreaks Multimodal LLMs with Multi-Image Inputs

ai-technology · 2026-05-20

Researchers propose DMN, a compositional jailbreak framework targeting multimodal large language models (MLLMs) that accept multi-image inputs. Unlike prior single-image methods, DMN distributes harmful instructions across multiple images, uses multimodal evidence, and introduces a number chain task to distract the model. Experiments show attack success rates over 90% on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4. The paper highlights vulnerabilities from insufficient multi-image safety alignment.

Key facts

  • DMN stands for Distributed instruction, Multimodal evidence, and Number chain task.
  • Achieves over 90% attack success rate on GPT-4o, Gemini-2.5-pro, and Claude Sonnet 4.
  • Exploits multi-image inputs to bypass safety alignment.
  • Previous methods only used single images, limiting attack space.
  • Published on arXiv with ID 2605.18915.

Entities

Institutions

  • arXiv

Sources