ARTFEED — Contemporary Art Intelligence

Rank-1 Zeroth-Order Queries Optimize High-Rank LoRA Fine-Tuning

ai-technology · 2026-05-20

A recent study published on arXiv (2605.19767v1) explores the rank paradox encountered when integrating zeroth-order (ZO) optimization with LoRA for fine-tuning large language models. ZO optimization eliminates the need to retain backpropagation activations, while LoRA introduces compact, trainable adapters. Although enhancing LoRA rank increases adapter capacity, traditional two-point ZO either disrupts a rank-dependent number of coordinates or renders the finite-difference signal unobservable during atomwise updates. The authors identify the issue as a measurement-topology challenge rather than a requirement for an external subspace. LoRA can be broken down into matched rank-1 atoms, each representing a complete factor-coordinate block of dimension d_out + d_in. Querying one atom per step preserves the adapter rank r while eliminating r from the single-query perturbation dimension, yet the naive atomwise query remains miscalibrated, leading to a reduction in the active finite-difference signal by a factor of 1/r due to the canonical LoRA scaling α/r.

Key facts

  • Paper arXiv:2605.19767v1 addresses rank paradox in ZO optimization with LoRA.
  • Zeroth-order optimization enables fine-tuning without storing backpropagation activations.
  • LoRA supplies compact trainable adapters for large-language-model fine-tuning.
  • Increasing LoRA rank improves adapter capacity but creates issues with standard two-point ZO.
  • Standard two-point ZO either perturbs a rank-dependent number of coordinates or makes signal unobservable.
  • Bottleneck identified as a measurement-topology problem rather than need for external subspace.
  • LoRA decomposes into matched rank-1 atoms, each a complete factor-coordinate block of dimension d_out + d_in.
  • Querying one atom per step keeps stored adapter rank r while removing r from perturbation dimension.
  • Naive atomwise query inherits canonical LoRA scaling α/r, causing signal to shrink as 1/r.

Entities

Institutions

  • arXiv

Sources