ARTFEED — Contemporary Art Intelligence

LoRA Enhances Off-Policy Reinforcement Learning Critics via Low-Rank Adaptation

ai-technology · 2026-04-22

A recent study has unveiled Low-Rank Adaptation (LoRA) as a method for structural-sparsity regularization in off-policy reinforcement learning critics. This approach tackles the issues associated with larger critics, which frequently suffer from overfitting and instability in replay-buffer-based bootstrap training. By optimizing only low-rank adapters while keeping randomly initialized base matrices fixed, updates to the critic are limited to a low-dimensional subspace. The researchers, building on SimbaV2, formulated LoRA to maintain the hyperspherical normalization geometry of SimbaV2 during frozen-backbone training. Evaluations using the SAC and FastTD3 algorithms on benchmarks such as DeepMind Control locomotion and IsaacLab robotics consistently demonstrated reduced critic loss and improved policy performance with LoRA. The findings were published on arXiv under identifier arXiv:2604.18978v1.

Key facts

  • LoRA serves as a structural-sparsity regularizer for off-policy RL critics
  • Larger critics are prone to overfitting and instability in bootstrap training
  • Method freezes base matrices and optimizes low-rank adapters only
  • Constrains critic updates to a low-dimensional subspace
  • Built on SimbaV2 with compatible formulation preserving normalization geometry
  • Evaluated with SAC and FastTD3 on DeepMind Control and IsaacLab benchmarks
  • Achieves lower critic loss and stronger policy performance consistently
  • Provides simple, scalable approach to scaling critic capacity

Entities

Institutions

  • arXiv
  • DeepMind
  • IsaacLab

Sources