StructBreak: New Framework Exposes Safety Failures in Multimodal LLMs
Researchers have identified a new vulnerability in Multimodal Large Language Models (MLLMs) called Structural Cognitive Overload (SCO), where deep reasoning conflicts with safety alignment, causing logical brittleness. To address this, they developed StructBreak, an automated end-to-end framework that quantifies SCO under practical black-box settings without internal model access. The framework establishes a benchmark across ten threat scenarios. Empirical evaluations on six leading MLLMs show a 92% average Attack Success Rate (ASR) in triggering toxic generation. This work highlights a previously unexplored attack paradigm beyond typographic and pixel-level perturbations.
Key facts
- StructBreak is an automated end-to-end framework to quantify Structural Cognitive Overload (SCO) in MLLMs.
- SCO is a phenomenon where deep reasoning and safety alignment contend, causing logical brittleness.
- Prior work focused on typographic and pixel-level perturbations, not SCO.
- StructBreak operates under a black-box setting, requiring no internal model access.
- The framework establishes a benchmark spanning ten diverse threat scenarios.
- Empirical evaluations on six leading MLLMs reveal a 92% average ASR for toxic generation.
- The research uncovers a novel higher-order cognitive overload attack paradigm.
- The study was published on arXiv with ID 2605.25534.
Entities
Institutions
- arXiv