SF-NorMuon: Schedule-Free Spectral Optimizer Matches Tuned AdamW

ai-technology · 2026-05-25

Researchers have introduced SF-NorMuon, a novel schedule-free spectral optimizer that bridges the performance divide between schedule-free techniques and optimized AdamW benchmarks. Tested on language models with 125M and 772M parameters across 1–8× Chinchilla horizons, SF-NorMuon achieves results that match or surpass those of tuned AdamW using just one hyperparameter setting. This approach eliminates the need for explicit learning-rate schedules, allowing for high-quality checkpoints at any stage of training without committing to a specific horizon. The authors establish a stationarity guarantee for the dynamics of schedule-free spectral methods and highlight the importance of weight decay at fast iterations for maintaining stability over long horizons. This research tackles issues of path dependence and the expensive re-tuning typically found in conventional neural network training.

Key facts

SF-NorMuon is a schedule-free spectral optimizer.
It matches or exceeds tuned AdamW on 125M and 772M parameter language models.
Evaluated across 1–8× Chinchilla horizons.
Uses a single hyperparameter configuration.
Removes explicit learning-rate schedules.
Enables anytime checkpointing without horizon commitment.
Proves stationarity guarantee for schedule-free spectral dynamics.
Identifies weight decay at fast iterate as essential for long-horizon stability.

Entities

—

Sources

arXiv cs.AI — 2026-05-25