Reconstruction-Concealment Tradeoff in MLLM Jailbreak Attacks

ai-technology · 2026-05-09

A new arXiv paper (2605.05709) reveals a fundamental tradeoff in jailbreak attacks on multimodal large language models (MLLMs). Intent-obfuscation attacks transform harmful queries into concealed multimodal inputs to bypass safety filters. The study shows these attacks are governed by a reconstruction-concealment tradeoff: the transformed input must hide harmful intent while remaining recoverable. Analysis of three black-box methods found existing transformations struggle to balance this tradeoff. Character-removed variants achieve better balance. The authors propose concealment-aware variant construction, which greedily selects diverse character-removed variants with low harmful-keyword alignment, instantiated through five modality-aware prompting strategies.

Key facts

Paper ID: arXiv:2605.05709
Title: Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs
Focuses on intent-obfuscation-based jailbreak attacks on multimodal large language models (MLLMs)
Identifies a reconstruction-concealment tradeoff governing such attacks
Analyzes three representative black-box methods
Finds existing transformations struggle to balance the tradeoff
Character-removed variants achieve a better balance
Proposes concealment-aware variant construction with five modality-aware prompting strategies

Reconstruction-Concealment Tradeoff in MLLM Jailbreak Attacks

Key facts

Entities

Institutions

Sources