ARTFEED — Contemporary Art Intelligence

GAC: Adaptive Mixing for Hybrid SFT-RL Post-Training

ai-technology · 2026-05-27

Researchers propose GAC, a noise-aware controller for hybrid post-training that adaptively mixes supervised fine-tuning and reinforcement learning signals. The method estimates gradient variance and disagreement between the two signals to compute a dynamic mixing weight, with smoothing, prior guidance, and bounded updates. Experiments on math, code, science, and logic benchmarks show consistent improvements over fixed and rule-based baselines, especially at larger model scales, with less than 1% training overhead.

Key facts

  • GAC stands for noise-aware adaptive mixing for hybrid SFT-RL post-training.
  • Fixed mixing schedules cannot adapt when relative noise of signals changes.
  • GAC derives adaptive mixing weight from online estimates of gradient variance and disagreement.
  • Method adds smoothing, prior guidance, and bounded updates.
  • Reuses existing training tensors.
  • Experiments on math, code, science, and logic benchmarks.
  • Consistent improvements over strong fixed and rule-based baselines.
  • Larger gains at larger model scales with less than 1% training overhead.

Entities

Institutions

  • arXiv

Sources