AdaMeZO: Memory-Efficient Zeroth-Order Optimizer for LLM Fine-Tuning

ai-technology · 2026-05-04

AdaMeZO is a novel zeroth-order optimizer designed for fine-tuning large language models (LLMs) without the memory overhead of maintaining first- and second-order moments. Traditional backpropagation-based methods require substantial GPU memory, while MeZO reduces memory by using only forward passes but suffers from slower convergence. AdaMeZO incorporates Adam-style moment estimates without storing them in memory, achieving faster convergence while preserving MeZO's memory efficiency. The paper provides theoretical analysis and experimental validation demonstrating AdaMeZO's effectiveness.

Key facts

AdaMeZO is a zeroth-order optimizer for LLM fine-tuning.
It leverages Adam-style first- and second-moment estimates without maintaining them in memory.
MeZO reduces GPU memory by using only forward passes.
Adam stores moments in memory, tripling memory requirements.
AdaMeZO aims to combine the memory efficiency of MeZO with the convergence speed of Adam.
The paper includes theoretical analysis and extensive experiments.
The work is published on arXiv with ID 2605.00650.
Fine-tuning LLMs is necessary for downstream tasks.

AdaMeZO: Memory-Efficient Zeroth-Order Optimizer for LLM Fine-Tuning

Key facts

Entities

Institutions

Sources