Gradient Penalized Latent Dynamics Improves DreamerV3 Sample Efficiency

ai-technology · 2026-05-25

A novel technique named GPLD (Gradient-Penalized Latent Dynamics) improves the DreamerV3 reinforcement learning framework by ensuring local smoothness in the transition dynamics it learns. This method introduces a row-wise Jacobian penalty to the posterior latent distribution, resembling a continuous-latent version of finite-difference smoothing found in discrete embedded-state MDPs. The estimation of GPLD is performed efficiently through Hutchinson-style stochastic probes. Results from empirical tests on DeepMind Control proprioceptive tasks indicate enhanced overall sample efficiency, with notable advancements particularly evident in more complex locomotion scenarios, such as challenging quadruped tasks.

Key facts

GPLD is a gradient-penalized latent dynamics regularizer for DreamerV3.
It applies a row-wise Jacobian penalty to the posterior latent distribution.
The penalty encourages locally smooth transition learning.
It is estimated using Hutchinson-style stochastic probes.
GPLD improves aggregate sample efficiency on DeepMind Control tasks.
Strong gains are observed on higher-complexity locomotion environments.
The method is tested on quadruped tasks.
The paper is available on arXiv with ID 2605.23089.

Gradient Penalized Latent Dynamics Improves DreamerV3 Sample Efficiency

Key facts

Entities

Institutions

Sources