Cluster Segregation Concealment: A New Defense Against Backdoor Attacks
A novel defense strategy known as Cluster Segregation Concealment (CSC) has been introduced to combat backdoor attacks on deep neural networks that utilize poisoning techniques. These attacks insert triggers into training datasets, leading to misclassification of affected inputs while the model remains accurate on untainted data. Current defensive measures often struggle against specific attack types and can reduce model performance. CSC is based on the insight that poisoned samples create distinct clusters in latent space during the initial training phase, with triggers being prominent features. The method involves training a network through conventional supervised learning while isolating poisoned samples via feature extraction, aiming to mitigate the poison without hindering model effectiveness. This research is documented in a paper available on arXiv (2604.21416).
Key facts
- CSC stands for Cluster Segregation Concealment
- The defense targets poisoning-based backdoor attacks
- Poisoned samples form isolated clusters in latent space early in training
- Existing defenses suffer from inadequate detection and accuracy degradation
- The method trains a network via standard supervised learning
- Triggers act as dominant features distinct from benign ones
- The paper is available on arXiv with ID 2604.21416
- The approach aims to suppress poison without compromising model utility
Entities
Institutions
- arXiv