Distillation attacks create deployment trade-off for AI models
A recent investigation published on arXiv examines the balance between the effectiveness of models and their susceptibility to distillation attacks. The researchers establish a minimax framework involving a teacher model constrained by utility and an adaptive student. They introduce one-sided response strategies: an adaptive evaluation method that allows the student to prioritize high-value examples, and a teacher-side defense aimed at minimizing outputs beneficial for distillation. Utilizing a low-cost proxy for example value, they create Product-of-Experts (PoE), a defense that operates solely during forward passes, integrating both teacher and proxy student in the generation process. Findings from tests on GSM8K and MATH reveal a significant passive-adaptive gap, indicating adaptive students regain much more capability than passive evaluations indicate against advanced defenses. The disparity in robustness between costly and inexpensive defenses diminishes with adaptive evaluation.
Key facts
- arXiv:2605.22737
- Distillation attacks create deployment trade-off for model providers
- Minimax game between utility-constrained teacher and adaptive student
- One-sided response rules: adaptive evaluation and teacher-side defense
- Product-of-Experts (PoE) defense combines teacher with proxy student
- Empirical results on GSM8K and MATH datasets
- Large passive-adaptive gap on state-of-the-art defenses
- Apparent robustness gap narrows under adaptive evaluation
Entities
Institutions
- arXiv