GeoRA: Geometry-Aware Low-Rank Adaptation Enhances RLVR for Reasoning Models
A novel approach known as GeoRA (Geometry-Aware Low-Rank Adaptation) has been introduced to enhance reinforcement learning with verifiable rewards (RLVR) in large-scale reasoning models. Current low-rank adaptation techniques, such as PiSSA, are tailored for supervised fine-tuning (SFT) and do not consider the unique optimization dynamics and geometric characteristics of RLVR. Fine-tuning the unstructured sparse parameter subspace preferred by RLVR proves to be inefficient on contemporary hardware. GeoRA takes advantage of the anisotropic and compressible nature of the RL update subspace to identify principal components, facilitating effective adaptation while maintaining the geometric structures established during pre-training. This method is elaborated upon in a paper available on arXiv (2601.09361).
Key facts
- GeoRA is a low-rank adaptation method tailored for RLVR.
- RLVR is a key paradigm for improving large-scale reasoning models.
- Existing methods like PiSSA are designed for SFT, not RLVR.
- RLVR requires preservation of pre-trained geometric structures.
- Direct fine-tuning of unstructured sparse subspaces is inefficient.
- GeoRA exploits anisotropic and compressible structure of RL update subspace.
- The paper is available on arXiv with ID 2601.09361.
- The method extracts principal components from the RL update subspace.
Entities
Institutions
- arXiv