DistractMIA: Black-Box Membership Inference Attack on Vision-Language Models

ai-technology · 2026-05-14

A new approach called DistractMIA has been introduced by researchers as a black-box membership inference attack targeting vision-language models (VLMs), relying solely on generated text responses. This technique differs from earlier strategies that depended on probability-level indicators or mask-based outputs. Instead, DistractMIA integrates a semantic distractor into the original image and observes the variations in the model's responses. The rationale behind this method is the belief that member samples are more closely tied to the semantics of the original image. The research paper can be found on arXiv with the identifier 2605.12574.

Key facts

DistractMIA is a black-box membership inference attack for VLMs.
It requires only generated textual responses, not probability-level signals.
The method inserts a semantic distractor into the original image.
It measures how generated responses change with the distractor.
Member samples are more anchored to original image semantics.
The paper is on arXiv with ID 2605.12574.
The attack is designed for deployed VLMs.
It addresses limitations of existing VLM membership inference attacks.

DistractMIA: Black-Box Membership Inference Attack on Vision-Language Models

Key facts

Entities

Institutions

Sources