GAC: Adaptive Mixing for Hybrid SFT-RL Post-Training

ai-technology · 2026-05-27

Researchers propose GAC, a noise-aware controller for hybrid post-training that adaptively mixes supervised fine-tuning and reinforcement learning signals. The method estimates gradient variance and disagreement between the two signals to compute a dynamic mixing weight, with smoothing, prior guidance, and bounded updates. Experiments on math, code, science, and logic benchmarks show consistent improvements over fixed and rule-based baselines, especially at larger model scales, with less than 1% training overhead.

Key facts

GAC stands for noise-aware adaptive mixing for hybrid SFT-RL post-training.
Fixed mixing schedules cannot adapt when relative noise of signals changes.
GAC derives adaptive mixing weight from online estimates of gradient variance and disagreement.
Method adds smoothing, prior guidance, and bounded updates.
Reuses existing training tensors.
Experiments on math, code, science, and logic benchmarks.
Consistent improvements over strong fixed and rule-based baselines.
Larger gains at larger model scales with less than 1% training overhead.

GAC: Adaptive Mixing for Hybrid SFT-RL Post-Training

Key facts

Entities

Institutions

Sources