AdaMeZO: Memory-Efficient Zeroth-Order Optimizer for LLM Fine-Tuning
AdaMeZO is a novel zeroth-order optimizer designed for fine-tuning large language models (LLMs) without the memory overhead of maintaining first- and second-order moments. Traditional backpropagation-based methods require substantial GPU memory, while MeZO reduces memory by using only forward passes but suffers from slower convergence. AdaMeZO incorporates Adam-style moment estimates without storing them in memory, achieving faster convergence while preserving MeZO's memory efficiency. The paper provides theoretical analysis and experimental validation demonstrating AdaMeZO's effectiveness.
Key facts
- AdaMeZO is a zeroth-order optimizer for LLM fine-tuning.
- It leverages Adam-style first- and second-moment estimates without maintaining them in memory.
- MeZO reduces GPU memory by using only forward passes.
- Adam stores moments in memory, tripling memory requirements.
- AdaMeZO aims to combine the memory efficiency of MeZO with the convergence speed of Adam.
- The paper includes theoretical analysis and extensive experiments.
- The work is published on arXiv with ID 2605.00650.
- Fine-tuning LLMs is necessary for downstream tasks.
Entities
Institutions
- arXiv