Membership Inference Attacks Reveal Privacy Risks in Masked Diffusion Language Models

other · 2026-05-20

A new study from arXiv (2605.16445) investigates membership inference attacks (MIA) on fine-tuned Masked Diffusion Language Models (MDLMs), finding them significantly more vulnerable than previously thought. The researchers extracted a 46-dimensional feature vector from the model's reconstruction loss at four masking ratios, training XGBoost and MLP classifiers. On the MIMIR benchmark across six text domains, XGBoost achieved a mean AUC of 0.878, peaking at 0.930 on Pile CC, outperforming the SAMA grey-box baseline by 0.062 AUC on average. A leave-one-signal-out ablation revealed that the ELBO trajectory alone drives most of the attack success, with a mean drop of 0.130 when removed, while attention features contributed almost nothing (below 0.003). The study also designed a shadow model transfer attack where K=3 surrogate MDLMs trained on unrelated domains generated classifier labels. This work highlights the privacy risks of MDLMs, which replace autoregressive generation with iterative demasking and whose privacy properties were largely unstudied.

Key facts

arXiv paper 2605.16445 studies membership inference attacks on fine-tuned MDLMs.
A 46-dimensional feature vector from reconstruction loss at four masking ratios is used.
XGBoost achieves mean AUC 0.878 on MIMIR benchmark, peaking at 0.930 on Pile CC.
XGBoost beats SAMA grey-box baseline by 0.062 AUC on average.
ELBO trajectory drives most attack success; attention features add below 0.003.
Shadow model transfer attack uses K=3 surrogate MDLMs from unrelated domains.
MDLMs replace autoregressive generation with iterative demasking.
Privacy properties of MDLMs were largely unstudied before this work.

Membership Inference Attacks Reveal Privacy Risks in Masked Diffusion Language Models

Key facts

Entities

Institutions

Sources