MirrorCheck: New Defense Against VLM Adversarial Attacks

ai-technology · 2026-05-25

Researchers have introduced MirrorCheck, a detection framework that is model-agnostic, aimed at protecting Vision-Language Models (VLMs) from advanced adversarial attacks, including those that adapt. This framework utilizes Text-to-Image (T2I) models to recreate images based on captions generated by the targeted VLM, subsequently evaluating semantic consistency by analyzing feature-space embeddings of both original and generated images. To mitigate adaptive attacks, MirrorCheck incorporates a stochastic defense mechanism, which randomly chooses T2I generators and image encoders from a varied model collection. Additionally, it implements a One-Time-Use (OTU) perturbation on the chosen encoder embeddings, controlled by a scaling factor to diminish attack effectiveness. The method proves successful in both unimodal and multimodal contexts. The research paper can be found on arXiv under ID 2406.09250.

Key facts

MirrorCheck is a model-agnostic detection framework for VLMs.
It uses T2I models to regenerate images from captions.
Semantic consistency is assessed via feature-space embeddings.
Stochastic defense randomly selects T2I generators and encoders.
One-Time-Use (OTU) perturbation is applied to encoder embeddings.
The framework works in unimodal and multimodal settings.
The paper is on arXiv: 2406.09250.
It addresses adaptive adversarial attacks.

MirrorCheck: New Defense Against VLM Adversarial Attacks

Key facts

Entities

Institutions

Sources