Distillation attacks create deployment trade-off for AI models

ai-technology · 2026-05-23

A recent investigation published on arXiv examines the balance between the effectiveness of models and their susceptibility to distillation attacks. The researchers establish a minimax framework involving a teacher model constrained by utility and an adaptive student. They introduce one-sided response strategies: an adaptive evaluation method that allows the student to prioritize high-value examples, and a teacher-side defense aimed at minimizing outputs beneficial for distillation. Utilizing a low-cost proxy for example value, they create Product-of-Experts (PoE), a defense that operates solely during forward passes, integrating both teacher and proxy student in the generation process. Findings from tests on GSM8K and MATH reveal a significant passive-adaptive gap, indicating adaptive students regain much more capability than passive evaluations indicate against advanced defenses. The disparity in robustness between costly and inexpensive defenses diminishes with adaptive evaluation.

Key facts

arXiv:2605.22737
Distillation attacks create deployment trade-off for model providers
Minimax game between utility-constrained teacher and adaptive student
One-sided response rules: adaptive evaluation and teacher-side defense
Product-of-Experts (PoE) defense combines teacher with proxy student
Empirical results on GSM8K and MATH datasets
Large passive-adaptive gap on state-of-the-art defenses
Apparent robustness gap narrows under adaptive evaluation

Distillation attacks create deployment trade-off for AI models

Key facts

Entities

Institutions

Sources