ARTFEED — Contemporary Art Intelligence

Five Generations of Learning Rate Scheduling Systematized in New Paper

publication · 2026-05-01

A new preprint on arXiv (2604.27295) outlines five generations of learning rate scheduling: global fixed rates, global scheduling, parameter-level adaptation, layer-level differentiation, and joint layer-time scheduling. The research introduces a framework called Discriminative Adaptive Layer Scaling (DALS), which combines phase-adaptive cosine scheduling with depth-aware Grokfast gradient filtering and LARS-style trust ratios. It assesses 18 optimizers across a range of tasks, addressing the challenge of the 'impossible trinity' in transfer learning, where lower layers require small updates but upper layers need more significant adjustments.

Key facts

  • arXiv preprint 2604.27295 systematizes learning rate scheduling into five generations.
  • Generations: Gen1 global fixed, Gen2 global scheduling, Gen3 parameter-level, Gen4 layer-level, Gen5 joint layer-time.
  • Proposes DALS framework integrating cosine scheduling, Grokfast filtering, and LARS trust ratios.
  • Benchmarks 18 optimizers across tasks.
  • Addresses the impossible trinity of transfer learning.
  • Lower layers require small updates to preserve general knowledge.
  • Higher layers need large updates to adapt to new tasks.
  • Paper available at https://arxiv.org/abs/2604.27295.

Entities

Institutions

  • arXiv

Sources