Five Generations of Learning Rate Scheduling Systematized in New Paper
A new preprint on arXiv (2604.27295) outlines five generations of learning rate scheduling: global fixed rates, global scheduling, parameter-level adaptation, layer-level differentiation, and joint layer-time scheduling. The research introduces a framework called Discriminative Adaptive Layer Scaling (DALS), which combines phase-adaptive cosine scheduling with depth-aware Grokfast gradient filtering and LARS-style trust ratios. It assesses 18 optimizers across a range of tasks, addressing the challenge of the 'impossible trinity' in transfer learning, where lower layers require small updates but upper layers need more significant adjustments.
Key facts
- arXiv preprint 2604.27295 systematizes learning rate scheduling into five generations.
- Generations: Gen1 global fixed, Gen2 global scheduling, Gen3 parameter-level, Gen4 layer-level, Gen5 joint layer-time.
- Proposes DALS framework integrating cosine scheduling, Grokfast filtering, and LARS trust ratios.
- Benchmarks 18 optimizers across tasks.
- Addresses the impossible trinity of transfer learning.
- Lower layers require small updates to preserve general knowledge.
- Higher layers need large updates to adapt to new tasks.
- Paper available at https://arxiv.org/abs/2604.27295.
Entities
Institutions
- arXiv