Rank-1 Zeroth-Order Queries Optimize High-Rank LoRA Fine-Tuning
A recent study published on arXiv (2605.19767v1) explores the rank paradox encountered when integrating zeroth-order (ZO) optimization with LoRA for fine-tuning large language models. ZO optimization eliminates the need to retain backpropagation activations, while LoRA introduces compact, trainable adapters. Although enhancing LoRA rank increases adapter capacity, traditional two-point ZO either disrupts a rank-dependent number of coordinates or renders the finite-difference signal unobservable during atomwise updates. The authors identify the issue as a measurement-topology challenge rather than a requirement for an external subspace. LoRA can be broken down into matched rank-1 atoms, each representing a complete factor-coordinate block of dimension d_out + d_in. Querying one atom per step preserves the adapter rank r while eliminating r from the single-query perturbation dimension, yet the naive atomwise query remains miscalibrated, leading to a reduction in the active finite-difference signal by a factor of 1/r due to the canonical LoRA scaling α/r.
Key facts
- Paper arXiv:2605.19767v1 addresses rank paradox in ZO optimization with LoRA.
- Zeroth-order optimization enables fine-tuning without storing backpropagation activations.
- LoRA supplies compact trainable adapters for large-language-model fine-tuning.
- Increasing LoRA rank improves adapter capacity but creates issues with standard two-point ZO.
- Standard two-point ZO either perturbs a rank-dependent number of coordinates or makes signal unobservable.
- Bottleneck identified as a measurement-topology problem rather than need for external subspace.
- LoRA decomposes into matched rank-1 atoms, each a complete factor-coordinate block of dimension d_out + d_in.
- Querying one atom per step keeps stored adapter rank r while removing r from perturbation dimension.
- Naive atomwise query inherits canonical LoRA scaling α/r, causing signal to shrink as 1/r.
Entities
Institutions
- arXiv