Cross-Modal Alignment for SemanticID Generation in Generative Recommendation
A new framework addresses three critical limitations in Generative Recommendation (GR) systems that use Semantic IDs (SIDs) to compress trillion-scale data. The problems include information degradation from two-stage compression, semantic degradation from cascaded quantization, and modality distortion between text and image features. The proposed solution integrates cross-modal alignment to improve SID quality and recommendation performance.
Key facts
- Generative Recommendation uses Semantic IDs for next-token prediction
- Two-stage compression pipeline causes semantic loss
- Cascaded quantization discards key multimodal features
- Quantizers fail to align text and image modalities
- New framework integrates cross-modal alignment
- Proposed method addresses information, semantic, and modality degradation
Entities
Institutions
- arXiv