ARTFEED — Contemporary Art Intelligence

Learning Rate Decay Undermines Curriculum-Based LLM Pretraining

ai-technology · 2026-04-27

A new arXiv paper (2511.18903) identifies a critical flaw in curriculum-based pretraining for large language models (LLMs): the incompatibility between ascending data quality order and decaying learning rate (LR) schedules. While curriculum training outperforms random shuffling under constant LR, its advantage vanishes with standard LR decay. The authors propose two simple mitigations: using a more moderate LR decay or adjusting the curriculum schedule. The study highlights that high-quality data is scarce, and naive curriculum strategies waste its potential.

Key facts

  • arXiv paper 2511.18903 identifies incompatibility between ascending data quality order and decaying learning rate schedules in curriculum-based LLM pretraining
  • Curriculum training outperforms random shuffling under constant learning rate
  • Advantage of curriculum training diminishes under standard LR decay schedules
  • Two mitigation strategies proposed: more moderate LR decay or adjusted curriculum schedule
  • High-quality data scarcity motivates curriculum-based pretraining
  • Prior studies reported limited improvements from curriculum-based pretraining
  • Experiments show incompatibility can be mitigated by simple strategies
  • Paper is a replace-cross announcement on arXiv

Entities

Institutions

  • arXiv

Sources