Cross-Modal Alignment for SemanticID Generation in Generative Recommendation

other · 2026-04-25

A new framework addresses three critical limitations in Generative Recommendation (GR) systems that use Semantic IDs (SIDs) to compress trillion-scale data. The problems include information degradation from two-stage compression, semantic degradation from cascaded quantization, and modality distortion between text and image features. The proposed solution integrates cross-modal alignment to improve SID quality and recommendation performance.

Key facts

Generative Recommendation uses Semantic IDs for next-token prediction
Two-stage compression pipeline causes semantic loss
Cascaded quantization discards key multimodal features
Quantizers fail to align text and image modalities
New framework integrates cross-modal alignment
Proposed method addresses information, semantic, and modality degradation

Cross-Modal Alignment for SemanticID Generation in Generative Recommendation

Key facts

Entities

Institutions

Sources