Canonical Knowledge Distillation Outperforms Complex Methods in Semantic Segmentation

publication · 2026-04-30

A recent arXiv study (2604.25530) indicates that straightforward canonical knowledge distillation (KD) techniques for semantic segmentation surpass more intricate, hand-crafted objectives when the compute time is equivalent. The researchers noted that comparisons based on iterations can be deceptive, as complex methods tend to raise costs per iteration. With prolonged training, feature-based distillation achieves top-tier performance for ResNet-18 on Cityscapes and ADE20K. A PSPNet ResNet-18 student, utilizing only a quarter of the parameters, attains 99% of the mIoU of its ResNet-101 teacher on Cityscapes (79.0 compared to 79.8) and 92% on ADE20K.

Key facts

Canonical logit- and feature-based KD outperform recent segmentation-specific methods under matched compute.
Feature-based distillation achieves state-of-the-art ResNet-18 performance on Cityscapes and ADE20K.
PSPNet ResNet-18 student reaches 99% of teacher's mIoU on Cityscapes (79.0 vs 79.8).
PSPNet ResNet-18 student reaches 92% of teacher's mIoU on ADE20K.
Student uses only one quarter of the parameters of the teacher.
Iteration-based comparisons are misleading due to unequal training budgets.
Study published on arXiv with ID 2604.25530.
Research conducted by authors from arXiv.

Canonical Knowledge Distillation Outperforms Complex Methods in Semantic Segmentation

Key facts

Entities

Institutions

Sources