SAM 3 and DINOv3 Distilled for Edge-Deployable Livestock Monitoring

ai-technology · 2026-05-01

So, there’s this new research article on arXiv (2604.27128) that talks about a method for shrinking the 446 million-parameter Perception Encoder backbone of SAM 3 down to a more convenient 40.66 million-parameter student model. This model is aimed at tracking livestock individually on edge devices. It uses a TinyViT-21M-512-based Feature Pyramid Network and has a unique four-term distillation loss method. To manage GPU memory better, it employs backbone-substitution inference with sliding-window session pruning. Plus, the DINOv3 series includes a pre-distilled ViT-S/16 model with 21.6 million parameters, released alongside a big 6716 million-parameter ViT-7B teacher, making it useful for precision livestock farming on more budget-friendly devices.

Key facts

arXiv paper 2604.27128
SAM 3 Perception Encoder distilled from 446M to 40.66M parameters
Student encoder uses TinyViT-21M-512 with Feature Pyramid Network
Four-term direction-then-scale distillation loss used
Sliding-window session pruning bounds streaming GPU memory
DINOv3 ViT-S/16 variant has 21.6M parameters
DINOv3 ViT-7B teacher has 6716M parameters
ViT-S (21M) adopted as per-individual embedder

SAM 3 and DINOv3 Distilled for Edge-Deployable Livestock Monitoring

Key facts

Entities

Institutions

Sources