ARTFEED — Contemporary Art Intelligence

Generalized Category Discovery Under Domain Shifts Using Foundation Models

ai-technology · 2026-05-06

A recent preprint on arXiv (2605.00906) presents three new frameworks for Generalized Category Discovery (GCD) that operate under domain shifts, transitioning foundation models from self-supervised vision to vision-language models. The initial approach, HiLo, separates domain and semantic attributes through multi-level feature extraction, mutual information minimization, PatchMix augmentation, and curriculum sampling. Building on HiLo, HLPrompt incorporates semantic-aware spatial prompt tuning to mitigate background and domain interference. Meanwhile, VLPrompt utilizes vision-language models by employing factorized textual prompts alongside cross-modal consistency regularization. All three frameworks are grounded in similar design principles. This research tackles the practical issue of unlabelled data displaying both domain and semantic variations, challenging the single-domain premise of prior GCD methods.

Key facts

  • arXiv preprint 2605.00906
  • Three frameworks for GCD under domain shifts
  • HiLo uses multi-level feature extraction and mutual information minimization
  • HLPrompt adds semantic-aware spatial prompt tuning
  • VLPrompt uses factorized textual prompts and cross-modal consistency
  • Methods adapt self-supervised vision and vision-language models
  • Addresses domain and semantic shifts in unlabelled data
  • Shared core design principles across methods

Entities

Institutions

  • arXiv

Sources