Hugging Face Introduces Delta Weight Sync for Efficient Async RL Training

ai-technology · 2026-05-27

Hugging Face has launched Delta Weight Sync, a novel technique for asynchronous reinforcement learning (RL) that markedly decreases data transfer between inference engines and trainers. The system identifies that 99% of bf16 weights do not change between optimizer steps, allowing it to transmit only the modified components, which shrinks the data load from 1.2 GB to between 20-35 MB per step. The delta file is stored in a Hugging Face Bucket, enabling the vLLM inference engine to retrieve it without needing direct network access, facilitating disaggregated training across various machines or locations. This method draws on findings from Fireworks AI and Cursor. The PR (huggingface/trl#5417) features a BF16ChangeDetector and a 30-line vLLM extension for sparse updates. For a 405B model, the delta is approximately 6 GB per step compared to 810 GB for full synchronization.

Key facts

Delta Weight Sync reduces per-step payload from 1.2 GB to 20-35 MB for Qwen3-0.6B
99% of bf16 weights are bit-identical between consecutive RL optimizer steps
Uses Hugging Face Bucket as shared object store for weight transfer
No direct connectivity required between trainer and inference cluster
Demonstrated fully disaggregated training across three separate machines
Based on observations from Fireworks AI and Cursor
PR available at huggingface/trl#5417
Supports multi-replica inference without additional overhead

Hugging Face Introduces Delta Weight Sync for Efficient Async RL Training

Key facts

Entities

Institutions

Sources