ARTFEED — Contemporary Art Intelligence

Versioned Late Materialization for Ultra-Long Sequence Training in Recommendation Systems at Scale

other · 2026-04-30

A new paradigm called versioned late materialization addresses storage and I/O bottlenecks in training Deep Learning Recommendation Models (DLRMs) with ultra-long User Interaction History (UIH). The industry-standard 'Fat Row' approach pre-materializes sequences into every training example, causing data redundancy that strains infrastructure, especially in multi-tenant environments. The proposed system stores UIH once in a normalized, immutable tier and reconstructs sequences just-in-time during training using lightweight versioned pointers. It ensures Online-to-Offline (O2O) consistency via a bifurcated protocol that prevents future leakage across streaming and batch training, while a read-optimized design further enhances efficiency.

Key facts

  • arXiv:2604.24806v1
  • Announce Type: cross
  • Modern Deep Learning Recommendation Models (DLRMs) follow scaling laws with sequence length
  • Industry-standard 'Fat Row' paradigm pre-materializes sequences into every training example
  • Data redundancy is amplified in multi-tenant environments
  • Versioned late materialization eliminates redundancy by storing UIH once in a normalized, immutable tier
  • Reconstructs sequences just-in-time during training via lightweight versioned pointers
  • Bifurcated protocol ensures Online-to-Offline (O2O) consistency

Entities

Institutions

  • arXiv

Sources