ARTFEED — Contemporary Art Intelligence

UCCL-Zip: Lossless Compression for GPU Communication in LLMs

ai-technology · 2026-04-24

The UCCL-Zip system introduces lossless compression into GPU communication primitives to tackle the limitations posed by large language models. In contrast to earlier techniques that relied on quantization or lossy compression, UCCL-Zip maintains numerical accuracy. It facilitates both point-to-point (P2P) and collective communication without altering user APIs. For P2P, Uzip-P2P employs a split-send pipeline that allows compression to occur simultaneously with communication. In the case of collective communication, Uzip-NCCL incorporates compression into NCCL's persistent kernel model through fused execution, which minimizes unnecessary memory traffic and kernel launches.

Key facts

  • UCCL-Zip integrates lossless compression into GPU communication primitives.
  • It avoids numerical errors from quantization or lossy compression.
  • Supports point-to-point and collective communication without API changes.
  • Uzip-P2P uses a split-send pipeline for P2P communication.
  • Uzip-NCCL integrates compression into NCCL's persistent kernel model.
  • The system reduces redundant memory traffic and kernel launches.
  • Designed for large language model training.
  • Preserves high GPU efficiency by operating on large data blocks.

Entities

Sources