ARTFEED — Contemporary Art Intelligence

Cross-Modal Alignment for SemanticID Generation in Generative Recommendation

other · 2026-04-25

A new framework addresses three critical limitations in Generative Recommendation (GR) systems that use Semantic IDs (SIDs) to compress trillion-scale data. The problems include information degradation from two-stage compression, semantic degradation from cascaded quantization, and modality distortion between text and image features. The proposed solution integrates cross-modal alignment to improve SID quality and recommendation performance.

Key facts

  • Generative Recommendation uses Semantic IDs for next-token prediction
  • Two-stage compression pipeline causes semantic loss
  • Cascaded quantization discards key multimodal features
  • Quantizers fail to align text and image modalities
  • New framework integrates cross-modal alignment
  • Proposed method addresses information, semantic, and modality degradation

Entities

Institutions

  • arXiv

Sources