ARTFEED — Contemporary Art Intelligence

Hugging Face Introduces Delta Weight Sync for Efficient Async RL Training

ai-technology · 2026-05-27

Hugging Face has launched Delta Weight Sync, a novel technique for asynchronous reinforcement learning (RL) that markedly decreases data transfer between inference engines and trainers. The system identifies that 99% of bf16 weights do not change between optimizer steps, allowing it to transmit only the modified components, which shrinks the data load from 1.2 GB to between 20-35 MB per step. The delta file is stored in a Hugging Face Bucket, enabling the vLLM inference engine to retrieve it without needing direct network access, facilitating disaggregated training across various machines or locations. This method draws on findings from Fireworks AI and Cursor. The PR (huggingface/trl#5417) features a BF16ChangeDetector and a 30-line vLLM extension for sparse updates. For a 405B model, the delta is approximately 6 GB per step compared to 810 GB for full synchronization.

Key facts

  • Delta Weight Sync reduces per-step payload from 1.2 GB to 20-35 MB for Qwen3-0.6B
  • 99% of bf16 weights are bit-identical between consecutive RL optimizer steps
  • Uses Hugging Face Bucket as shared object store for weight transfer
  • No direct connectivity required between trainer and inference cluster
  • Demonstrated fully disaggregated training across three separate machines
  • Based on observations from Fireworks AI and Cursor
  • PR available at huggingface/trl#5417
  • Supports multi-replica inference without additional overhead

Entities

Institutions

  • Hugging Face
  • Fireworks AI
  • Cursor
  • vLLM

Sources