Unlearnable Examples Fail Under Pretraining-Finetuning Paradigm
A new study from arXiv (2605.05224v1) systematically investigates unlearnable examples (UEs) across diverse training paradigms for the first time. UEs embed imperceptible perturbations into benign data to obstruct feature learning, addressing privacy threats from unauthorized use of personal data in model training. The research reveals that existing UE methods, primarily evaluated under from-scratch training, are significantly weakened when pretrained weights are loaded and frozen under the pretraining-finetuning (PF) paradigm. The authors explain this through semantic filtering: UEs induce models to overfit non-semantic noise, reducing semantic extraction capabilities, but under PF, frozen shallow layers preserve data semantics, effectively filtering out perturbations. The findings highlight a critical gap in current UE defenses.
Key facts
- arXiv paper 2605.05224v1 provides first systematic investigation of unlearnable examples across diverse training paradigms.
- Unlearnable examples embed imperceptible perturbations into benign examples to obstruct feature learning.
- Existing UE methods are mainly evaluated under from-scratch training settings.
- Loading and freezing pretrained weights significantly weakens existing UE methods.
- Semantic filtering explains the failure: frozen shallow layers preserve data semantics under PF paradigm.
- UEs tend to induce models to overfit non-semantic noise, weakening semantic extraction.
- The study addresses privacy threats from unauthorized personal data use in model training.
- Research reveals a critical gap in current UE defenses.
Entities
Institutions
- arXiv