ARTFEED — Contemporary Art Intelligence

ScheduleFree+ Outperforms WSD Schedules in LLM Training

ai-technology · 2026-05-20

A new machine learning method, ScheduleFree+, extends Schedule-Free Learning to large language models (LLMs) by addressing scaling issues with larger batch sizes and model sizes. The method eliminates the need for learning rate schedules and outperforms Warmup-Stable-Decay (WSD) schedules. At 1000 tokens per parameter, it achieves a 31% improvement over state-of-the-art schedules. The approach provides a theoretical foundation for model averaging and checkpoint merging during pretraining.

Key facts

  • ScheduleFree+ is a learning-rate-free and schedule-free method for training LLMs.
  • It outperforms Warmup-Stable-Decay (WSD) schedules.
  • At 1000 tokens per parameter, it outperforms SOTA schedules by 31%.
  • Schedule-Free Learning has shown success across dozens of standard benchmark problems.
  • Strong performance for LLM training was previously only demonstrated at small scales.
  • The method provides a theoretical foundation for model averaging and checkpoint merging.
  • The paper identifies fixes necessary to scale up Schedule-Free Learning to larger batch sizes and model sizes.
  • Schedule-Free Learning is most effective for long duration training.

Entities

Institutions

  • arXiv

Sources