CA-DSSL: Self-Supervised Learning for Sub-Megabyte MCU Models

ai-technology · 2026-05-12

A novel approach named Capacity-Aware Distilled Self-Supervised Learning (CA-DSSL) facilitates self-supervised pretraining for microcontroller (MCU) models with fewer than 500K parameters, an area previously uncharted due to challenges like projection head prevalence, representation limitations, and sensitivity to augmentation. Employing a frozen DINO ViT-S/16 teacher, CA-DSSL implements asymmetric distillation, multi-scale feature distillation, and a gradual augmentation curriculum. Utilizing a MobileNetV2-0.35 backbone (396K parameters) trained on CIFAR-100, CA-DSSL achieves a linear-probe accuracy of 62.7% (3-seed mean), outperforming SimCLR-Tiny by 18 percentage points and equaling SEED (61.7%) while using 10 fewer projection parameters (426K vs. 3.15M), attaining 94.0% of the supervised upper limit. This method is both label-free and text-free, ideal for edge devices with limited resources.

Key facts

CA-DSSL is a teacher-guided self-supervised learning framework for MCU-class models.
It addresses three obstacles: projection head dominance, representation bottleneck, and augmentation sensitivity.
Uses a frozen DINO ViT-S/16 teacher for asymmetric distillation.
Employs multi-scale feature distillation and a progressive augmentation curriculum.
Tested on MobileNetV2-0.35 backbone with 396K parameters.
Pretrained on CIFAR-100 dataset.
Achieves 62.7% linear-probe accuracy (3-seed mean).
Surpasses SimCLR-Tiny by 18 percentage points.
Matches SEED (61.7%) with 426K vs. 3.15M projection parameters.
Reaches 94.0% of a supervised upper bound.

CA-DSSL: Self-Supervised Learning for Sub-Megabyte MCU Models

Key facts

Entities

Institutions

Sources