SDPG: Efficient Visual RL Training on a Single GPU

ai-technology · 2026-05-27

A new lightweight visual reinforcement learning technique called the stochastic decoupled policy gradient (SDPG) has been developed by researchers. This method enables the training of various visuomotor control policies end-to-end within hours using just one NVIDIA RTX 4080 GPU. SDPG calculates policy gradients by applying random perturbations to trajectory rollouts, significantly lowering the need for batch-rendered environments and minimizing compute and memory demands. In visual MuJoCo benchmarks, SDPG consistently surpasses baseline methods in terms of training duration, memory efficiency, and rewards. Additionally, the team has released a comprehensive set of realistic visual robotics benchmarks that include dexterous manipulation and complex locomotion, showcasing effective sim-to-real transfer on actual hardware. The paper can be found on arXiv.

Key facts

SDPG trains visuomotor policies end-to-end on a single NVIDIA RTX 4080 GPU within hours.
SDPG uses random perturbations of trajectory rollouts to estimate policy gradients.
SDPG requires orders of magnitude fewer batch-rendered environments.
SDPG outperforms baselines on visual MuJoCo benchmarks in training time, memory, and rewards.
A new suite of visual robotics benchmarks includes dexterous manipulation and locomotion tasks.
Sim-to-real transfer was demonstrated on physical hardware.
The method reduces compute and memory overhead significantly.
The paper is published on arXiv under Computer Science > Robotics.

SDPG: Efficient Visual RL Training on a Single GPU

Key facts

Entities

Institutions

Sources