LoRA Enhances Off-Policy Reinforcement Learning Critics via Low-Rank Adaptation

ai-technology · 2026-04-22

A recent study has unveiled Low-Rank Adaptation (LoRA) as a method for structural-sparsity regularization in off-policy reinforcement learning critics. This approach tackles the issues associated with larger critics, which frequently suffer from overfitting and instability in replay-buffer-based bootstrap training. By optimizing only low-rank adapters while keeping randomly initialized base matrices fixed, updates to the critic are limited to a low-dimensional subspace. The researchers, building on SimbaV2, formulated LoRA to maintain the hyperspherical normalization geometry of SimbaV2 during frozen-backbone training. Evaluations using the SAC and FastTD3 algorithms on benchmarks such as DeepMind Control locomotion and IsaacLab robotics consistently demonstrated reduced critic loss and improved policy performance with LoRA. The findings were published on arXiv under identifier arXiv:2604.18978v1.

Key facts

LoRA serves as a structural-sparsity regularizer for off-policy RL critics
Larger critics are prone to overfitting and instability in bootstrap training
Method freezes base matrices and optimizes low-rank adapters only
Constrains critic updates to a low-dimensional subspace
Built on SimbaV2 with compatible formulation preserving normalization geometry
Evaluated with SAC and FastTD3 on DeepMind Control and IsaacLab benchmarks
Achieves lower critic loss and stronger policy performance consistently
Provides simple, scalable approach to scaling critic capacity

LoRA Enhances Off-Policy Reinforcement Learning Critics via Low-Rank Adaptation

Key facts

Entities

Institutions

Sources