Generalized Category Discovery Under Domain Shifts Using Foundation Models

ai-technology · 2026-05-06

A recent preprint on arXiv (2605.00906) presents three new frameworks for Generalized Category Discovery (GCD) that operate under domain shifts, transitioning foundation models from self-supervised vision to vision-language models. The initial approach, HiLo, separates domain and semantic attributes through multi-level feature extraction, mutual information minimization, PatchMix augmentation, and curriculum sampling. Building on HiLo, HLPrompt incorporates semantic-aware spatial prompt tuning to mitigate background and domain interference. Meanwhile, VLPrompt utilizes vision-language models by employing factorized textual prompts alongside cross-modal consistency regularization. All three frameworks are grounded in similar design principles. This research tackles the practical issue of unlabelled data displaying both domain and semantic variations, challenging the single-domain premise of prior GCD methods.

Key facts

arXiv preprint 2605.00906
Three frameworks for GCD under domain shifts
HiLo uses multi-level feature extraction and mutual information minimization
HLPrompt adds semantic-aware spatial prompt tuning
VLPrompt uses factorized textual prompts and cross-modal consistency
Methods adapt self-supervised vision and vision-language models
Addresses domain and semantic shifts in unlabelled data
Shared core design principles across methods

Generalized Category Discovery Under Domain Shifts Using Foundation Models

Key facts

Entities

Institutions

Sources