ARTFEED — Contemporary Art Intelligence

SF-NorMuon: Schedule-Free Spectral Optimizer Matches Tuned AdamW

ai-technology · 2026-05-25

Researchers have introduced SF-NorMuon, a novel schedule-free spectral optimizer that bridges the performance divide between schedule-free techniques and optimized AdamW benchmarks. Tested on language models with 125M and 772M parameters across 1–8× Chinchilla horizons, SF-NorMuon achieves results that match or surpass those of tuned AdamW using just one hyperparameter setting. This approach eliminates the need for explicit learning-rate schedules, allowing for high-quality checkpoints at any stage of training without committing to a specific horizon. The authors establish a stationarity guarantee for the dynamics of schedule-free spectral methods and highlight the importance of weight decay at fast iterations for maintaining stability over long horizons. This research tackles issues of path dependence and the expensive re-tuning typically found in conventional neural network training.

Key facts

  • SF-NorMuon is a schedule-free spectral optimizer.
  • It matches or exceeds tuned AdamW on 125M and 772M parameter language models.
  • Evaluated across 1–8× Chinchilla horizons.
  • Uses a single hyperparameter configuration.
  • Removes explicit learning-rate schedules.
  • Enables anytime checkpointing without horizon commitment.
  • Proves stationarity guarantee for schedule-free spectral dynamics.
  • Identifies weight decay at fast iterate as essential for long-horizon stability.

Entities

Sources