ARTFEED — Contemporary Art Intelligence

AdaMeZO: Memory-Efficient Zeroth-Order Optimizer for LLM Fine-Tuning

ai-technology · 2026-05-04

AdaMeZO is a novel zeroth-order optimizer designed for fine-tuning large language models (LLMs) without the memory overhead of maintaining first- and second-order moments. Traditional backpropagation-based methods require substantial GPU memory, while MeZO reduces memory by using only forward passes but suffers from slower convergence. AdaMeZO incorporates Adam-style moment estimates without storing them in memory, achieving faster convergence while preserving MeZO's memory efficiency. The paper provides theoretical analysis and experimental validation demonstrating AdaMeZO's effectiveness.

Key facts

  • AdaMeZO is a zeroth-order optimizer for LLM fine-tuning.
  • It leverages Adam-style first- and second-moment estimates without maintaining them in memory.
  • MeZO reduces GPU memory by using only forward passes.
  • Adam stores moments in memory, tripling memory requirements.
  • AdaMeZO aims to combine the memory efficiency of MeZO with the convergence speed of Adam.
  • The paper includes theoretical analysis and extensive experiments.
  • The work is published on arXiv with ID 2605.00650.
  • Fine-tuning LLMs is necessary for downstream tasks.

Entities

Institutions

  • arXiv

Sources