Clean-Label Backdoor Attack on Vision-Language Models via Diffusion Models

ai-technology · 2026-05-06

A team of researchers has introduced CBV, a clean-label backdoor attack targeting Vision-Language Models (VLMs) that employs diffusion models to create naturally poisoned samples. In contrast to previous methods that depend on visual cues and altered text labels, CBV alters the score in the reverse generation phase of the diffusion model, embedding triggered image characteristics while maintaining alignment between images and text. This technique aims to avoid detection by generating samples that seem untainted. The comprehensive details of this method can be found in a preprint available on arXiv (2605.02202).

Key facts

CBV stands for Clean-Label Backdoor Attack on VLMs via Diffusion Models.
The attack targets Vision-Language Models (VLMs) used in image captioning and VQA.
Existing backdoor attacks use visual triggers and modified text labels, causing detectable mismatches.
CBV uses diffusion models to generate poisoned samples via score matching.
The attack modifies the score during reverse generation to embed triggered features.
Textual information of the trigger is incorporated to enhance effectiveness.
The research is published as a preprint on arXiv with ID 2605.02202.
The paper was announced as new on arXiv.

Clean-Label Backdoor Attack on Vision-Language Models via Diffusion Models

Key facts

Entities

Institutions

Sources