CA-DSSL: Self-Supervised Learning for Sub-Megabyte MCU Models
A novel approach named Capacity-Aware Distilled Self-Supervised Learning (CA-DSSL) facilitates self-supervised pretraining for microcontroller (MCU) models with fewer than 500K parameters, an area previously uncharted due to challenges like projection head prevalence, representation limitations, and sensitivity to augmentation. Employing a frozen DINO ViT-S/16 teacher, CA-DSSL implements asymmetric distillation, multi-scale feature distillation, and a gradual augmentation curriculum. Utilizing a MobileNetV2-0.35 backbone (396K parameters) trained on CIFAR-100, CA-DSSL achieves a linear-probe accuracy of 62.7% (3-seed mean), outperforming SimCLR-Tiny by 18 percentage points and equaling SEED (61.7%) while using 10 fewer projection parameters (426K vs. 3.15M), attaining 94.0% of the supervised upper limit. This method is both label-free and text-free, ideal for edge devices with limited resources.
Key facts
- CA-DSSL is a teacher-guided self-supervised learning framework for MCU-class models.
- It addresses three obstacles: projection head dominance, representation bottleneck, and augmentation sensitivity.
- Uses a frozen DINO ViT-S/16 teacher for asymmetric distillation.
- Employs multi-scale feature distillation and a progressive augmentation curriculum.
- Tested on MobileNetV2-0.35 backbone with 396K parameters.
- Pretrained on CIFAR-100 dataset.
- Achieves 62.7% linear-probe accuracy (3-seed mean).
- Surpasses SimCLR-Tiny by 18 percentage points.
- Matches SEED (61.7%) with 426K vs. 3.15M projection parameters.
- Reaches 94.0% of a supervised upper bound.
Entities
Institutions
- arXiv