PictSure: Pretraining Embeddings Key for In-Context Learning Image Classifiers
A new research paper, 'PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers,' published on arXiv (2506.14842v2), investigates the factors influencing in-context learning (ICL) for few-shot image classification (FSIC). The authors introduce PictSure, a vision-only ICL model family using fusion transformer architectures. Their experiments reveal that the quality of encoder pretraining embeddings strongly correlates with downstream ICL performance, both in-domain and out-of-domain. In contrast, varying the fusion transformer training dataset—from ImageNet alone to diverse multi-domain mixtures—yields limited additional gains. The study underscores the importance of pretraining representation quality over fusion-layer data diversity for effective ICL in image classification.
Key facts
- Paper titled 'PictSure: Pretraining Embeddings Matters for In-Context Learning Image Classifiers'
- Published on arXiv with ID 2506.14842v2
- Introduces PictSure, a vision-only ICL model family
- Uses fusion transformer architectures
- Finds pretraining embedding quality strongly correlates with ICL performance
- Varying fusion transformer training data (ImageNet vs. multi-domain mixtures) provides limited gains
- Evaluated in both in-domain and out-of-domain settings
- Focuses on few-shot image classification (FSIC)
Entities
Institutions
- arXiv