Pretraining Objective Impacts Low-Data Fine-Grained Classification

other · 2026-05-18

Researchers conducted an analysis of graded emerald inclusions, focusing on four ViT-B/16 encoders, all of which have similar backbone capacities. They utilized a specialized dataset containing labeled images categorized into three distinct groups. The study assessed various training methodologies, including supervised classification, contrastive learning, masked reconstruction, and self-distillation. To gauge representation effectiveness, leave-one-out cross-validation was performed using both linear and nonlinear probes, employing permutation testing with 1,000 iterations. The results revealed that the supervised and contrastive encoders achieved the highest linear separability, while MAE performed better with nonlinear methods, aiding in the selection of pretrained encoders for fine-grained classification tasks.

Key facts

Study focuses on emerald inclusion grading with a custom dataset of labeled images across three classes.
Compares four frozen ViT-B/16 encoders: supervised classification, SigLIP2, MAE, DINOv3.
Evaluation uses leave-one-out cross-validation with linear and nonlinear probes.
Permutation testing (N=1000) on macro one-vs-rest AUC controls statistical noise.
Supervised and contrastive encoders provide strongest linear separability (logistic AUC: 0.768 and 0.735; SVM AUC: 0.739 and 0.697).
MAE improves under nonlinear probes.
Research published on arXiv with ID 2605.15599.
Study addresses extreme low-data fine-grained classification in expert domains.

Pretraining Objective Impacts Low-Data Fine-Grained Classification

Key facts

Entities

Institutions

Sources