MirrorCheck: New Defense Against VLM Adversarial Attacks
Researchers have introduced MirrorCheck, a detection framework that is model-agnostic, aimed at protecting Vision-Language Models (VLMs) from advanced adversarial attacks, including those that adapt. This framework utilizes Text-to-Image (T2I) models to recreate images based on captions generated by the targeted VLM, subsequently evaluating semantic consistency by analyzing feature-space embeddings of both original and generated images. To mitigate adaptive attacks, MirrorCheck incorporates a stochastic defense mechanism, which randomly chooses T2I generators and image encoders from a varied model collection. Additionally, it implements a One-Time-Use (OTU) perturbation on the chosen encoder embeddings, controlled by a scaling factor to diminish attack effectiveness. The method proves successful in both unimodal and multimodal contexts. The research paper can be found on arXiv under ID 2406.09250.
Key facts
- MirrorCheck is a model-agnostic detection framework for VLMs.
- It uses T2I models to regenerate images from captions.
- Semantic consistency is assessed via feature-space embeddings.
- Stochastic defense randomly selects T2I generators and encoders.
- One-Time-Use (OTU) perturbation is applied to encoder embeddings.
- The framework works in unimodal and multimodal settings.
- The paper is on arXiv: 2406.09250.
- It addresses adaptive adversarial attacks.
Entities
Institutions
- arXiv