Semantic Probe Improves CLIP Fine-Tuning for Cross-Domain Few-Shot Learning

other · 2026-05-13

A recent study published on arXiv (2605.11659) explores fine-tuning strategies for CLIP within the realm of Cross-Domain Few-Shot Learning (CDFSL). The researchers discovered that adapter-based techniques, such as LoRA, surpass prompt-based approaches like MaPLe, which is contrary to findings in in-domain contexts. They credit LoRA's effectiveness to its ability to address the collapsed attention of the visual CLS token, thereby improving both modality alignment and class differentiation by concentrating on text-relevant visual areas. Additionally, they note that the textual EOS token shows enhanced attention to visual samples, while CLIP's conventional contrastive loss offers limited constraints on modality alignment. To leverage these findings, they introduce Semantic Probe, a flexible attention mechanism aimed at revitalizing in-domain fine-tuning methods in CDFSL.

Key facts

arXiv paper 2605.11659
Cross-Domain Few-Shot Learning (CDFSL)
Adapter-based methods (e.g., LoRA) outperform prompt-based (e.g., MaPLe)
LoRA rectifies collapsed attention of visual CLS token
Textual EOS token shows better attention to visual samples
CLIP's contrastive loss weakly constrains modality alignment
Proposed method: Semantic Probe
Semantic Probe is plug-and-play attention mechanism

Semantic Probe Improves CLIP Fine-Tuning for Cross-Domain Few-Shot Learning

Key facts

Entities

Institutions

Sources