Generalized Category Discovery Under Domain Shifts Using Foundation Models
A recent preprint on arXiv (2605.00906) presents three new frameworks for Generalized Category Discovery (GCD) that operate under domain shifts, transitioning foundation models from self-supervised vision to vision-language models. The initial approach, HiLo, separates domain and semantic attributes through multi-level feature extraction, mutual information minimization, PatchMix augmentation, and curriculum sampling. Building on HiLo, HLPrompt incorporates semantic-aware spatial prompt tuning to mitigate background and domain interference. Meanwhile, VLPrompt utilizes vision-language models by employing factorized textual prompts alongside cross-modal consistency regularization. All three frameworks are grounded in similar design principles. This research tackles the practical issue of unlabelled data displaying both domain and semantic variations, challenging the single-domain premise of prior GCD methods.
Key facts
- arXiv preprint 2605.00906
- Three frameworks for GCD under domain shifts
- HiLo uses multi-level feature extraction and mutual information minimization
- HLPrompt adds semantic-aware spatial prompt tuning
- VLPrompt uses factorized textual prompts and cross-modal consistency
- Methods adapt self-supervised vision and vision-language models
- Addresses domain and semantic shifts in unlabelled data
- Shared core design principles across methods
Entities
Institutions
- arXiv