ARTFEED — Contemporary Art Intelligence

SpecPL: Spectral Prompt Learning for Vision-Language Models

ai-technology · 2026-05-07

SpecPL has unveiled an innovative prompt learning technique for vision-language models (VLMs) that tackles modality asymmetry by separating spectral granularity. Current methods typically optimize text tokens while relying on a static visual encoder that overlooks intricate spectral nuances. In contrast, SpecPL employs a frozen VAE to break down visual signals into both semantic low-frequency bands and detailed high-frequency components. A Visual Semantic Bank aligns text representations with low-frequency invariants, helping to minimize overfitting. The approach achieves fine-grained discrimination through counterfactual granule training, which rearranges high-frequency signals, compelling the model to differentiate between visual granularity and semantic invariance. This methodology is elaborated in a paper available on arXiv, identified by ID 2605.04504.

Key facts

  • SpecPL stands for Disentangling Spectral Granularity for Prompt Learning.
  • It addresses modality asymmetry in VLM prompt learning.
  • Uses a frozen VAE to decompose visual signals.
  • Separates signals into low-frequency (semantic) and high-frequency (granular) bands.
  • Employs a frozen Visual Semantic Bank for low-frequency anchoring.
  • Counterfactual granule training permutes high-frequency signals.
  • Paper available on arXiv with ID 2605.04504.
  • Published on arXiv under cross category.

Entities

Institutions

  • arXiv

Sources