ARTFEED — Contemporary Art Intelligence

CA-DSSL: Self-Supervised Learning for Sub-Megabyte MCU Models

ai-technology · 2026-05-12

A novel approach named Capacity-Aware Distilled Self-Supervised Learning (CA-DSSL) facilitates self-supervised pretraining for microcontroller (MCU) models with fewer than 500K parameters, an area previously uncharted due to challenges like projection head prevalence, representation limitations, and sensitivity to augmentation. Employing a frozen DINO ViT-S/16 teacher, CA-DSSL implements asymmetric distillation, multi-scale feature distillation, and a gradual augmentation curriculum. Utilizing a MobileNetV2-0.35 backbone (396K parameters) trained on CIFAR-100, CA-DSSL achieves a linear-probe accuracy of 62.7% (3-seed mean), outperforming SimCLR-Tiny by 18 percentage points and equaling SEED (61.7%) while using 10 fewer projection parameters (426K vs. 3.15M), attaining 94.0% of the supervised upper limit. This method is both label-free and text-free, ideal for edge devices with limited resources.

Key facts

  • CA-DSSL is a teacher-guided self-supervised learning framework for MCU-class models.
  • It addresses three obstacles: projection head dominance, representation bottleneck, and augmentation sensitivity.
  • Uses a frozen DINO ViT-S/16 teacher for asymmetric distillation.
  • Employs multi-scale feature distillation and a progressive augmentation curriculum.
  • Tested on MobileNetV2-0.35 backbone with 396K parameters.
  • Pretrained on CIFAR-100 dataset.
  • Achieves 62.7% linear-probe accuracy (3-seed mean).
  • Surpasses SimCLR-Tiny by 18 percentage points.
  • Matches SEED (61.7%) with 426K vs. 3.15M projection parameters.
  • Reaches 94.0% of a supervised upper bound.

Entities

Institutions

  • arXiv

Sources