ARTFEED — Contemporary Art Intelligence

Reconstruction-Concealment Tradeoff in MLLM Jailbreak Attacks

ai-technology · 2026-05-09

A new arXiv paper (2605.05709) reveals a fundamental tradeoff in jailbreak attacks on multimodal large language models (MLLMs). Intent-obfuscation attacks transform harmful queries into concealed multimodal inputs to bypass safety filters. The study shows these attacks are governed by a reconstruction-concealment tradeoff: the transformed input must hide harmful intent while remaining recoverable. Analysis of three black-box methods found existing transformations struggle to balance this tradeoff. Character-removed variants achieve better balance. The authors propose concealment-aware variant construction, which greedily selects diverse character-removed variants with low harmful-keyword alignment, instantiated through five modality-aware prompting strategies.

Key facts

  • Paper ID: arXiv:2605.05709
  • Title: Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs
  • Focuses on intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs)
  • Identifies a reconstruction-concealment tradeoff governing such attacks
  • Analyzes three representative black-box methods
  • Finds existing transformations struggle to balance the tradeoff
  • Character-removed variants achieve a better balance
  • Proposes concealment-aware variant construction with five modality-aware prompting strategies

Entities

Institutions

  • arXiv

Sources