New Method Calibrates Adam Optimizer for LLMs Using Signal-to-Noise Ratio
A new method, Module-wise Learning Rate Scaling via SNR (MoLS), addresses gradient heterogeneity in large language models by estimating module-level signal-to-noise ratios to scale Adam optimizer updates. The approach, detailed in arXiv:2605.05794, automates module-wise learning rate allocation without manual tuning, aiming to improve convergence and stability in training LLMs with heterogeneous module compositions.
Key facts
- arXiv:2605.05794 introduces MoLS
- MoLS estimates module-level SNRs
- MoLS scales Adam updates automatically
- Addresses gradient heterogeneity in LLMs
- Aims to improve convergence and stability
- No manual module-specific learning rates needed
- Published on arXiv
- Announce type: cross
Entities
Institutions
- arXiv