DarkLLM: LLM-Driven Adversarial Attacks on Vision Models

ai-technology · 2026-05-20

DarkLLM represents an innovative attack framework that utilizes a large language model to convert natural-language commands into latent attack vectors. These vectors are subsequently transformed into visual adversarial perturbations. This framework integrates targeted, untargeted, segmentation, and multi-model attacks into one cohesive system, allowing for adaptable and precise adversarial generation across diverse models. The findings of this research can be found on arXiv, listed under the identifier 2605.18868.

Key facts

DarkLLM trains an LLM to translate natural-language attack instructions into latent attack vectors.
The latent vectors are decoded into visual adversarial perturbations.
The framework unifies targeted, untargeted, segmentation, and multi-model attacks.
It enables flexible and controllable adversarial generation.
The research is published on arXiv (2605.18868).
Vision and multimodal foundation models are vulnerable to adversarial attacks.
Traditional attacks are limited to single, predefined objectives.
DarkLLM uses natural-language instruction tuning.

DarkLLM: LLM-Driven Adversarial Attacks on Vision Models

Key facts

Entities

Institutions

Sources