ScheduleFree+ Outperforms WSD Schedules in LLM Training

ai-technology · 2026-05-20

A new machine learning method, ScheduleFree+, extends Schedule-Free Learning to large language models (LLMs) by addressing scaling issues with larger batch sizes and model sizes. The method eliminates the need for learning rate schedules and outperforms Warmup-Stable-Decay (WSD) schedules. At 1000 tokens per parameter, it achieves a 31% improvement over state-of-the-art schedules. The approach provides a theoretical foundation for model averaging and checkpoint merging during pretraining.

Key facts

ScheduleFree+ is a learning-rate-free and schedule-free method for training LLMs.
It outperforms Warmup-Stable-Decay (WSD) schedules.
At 1000 tokens per parameter, it outperforms SOTA schedules by 31%.
Schedule-Free Learning has shown success across dozens of standard benchmark problems.
Strong performance for LLM training was previously only demonstrated at small scales.
The method provides a theoretical foundation for model averaging and checkpoint merging.
The paper identifies fixes necessary to scale up Schedule-Free Learning to larger batch sizes and model sizes.
Schedule-Free Learning is most effective for long duration training.

ScheduleFree+ Outperforms WSD Schedules in LLM Training

Key facts

Entities

Institutions

Sources