Dream-MPC: Gradient-Based Model Predictive Control with Latent Imagination
A new reinforcement learning approach, Dream-MPC, combines gradient-based optimization with model predictive control. The method generates candidate trajectories from a policy prior and refines them via gradient ascent using a learned world model, uncertainty regularization, and amortization. This addresses the computational expense of gradient-free population-based methods in high-dimensional control tasks, which have previously outperformed gradient-based alternatives. The work is published on arXiv under identifier 2605.04568.
Key facts
- Dream-MPC is a novel approach combining gradient-based optimization with MPC.
- It generates few candidate trajectories from a rolled-out policy.
- Each trajectory is optimized by gradient ascent using a learned world model.
- The method includes uncertainty regularization and amortization.
- It aims to reduce computational cost compared to gradient-free population-based methods.
- Gradient-free methods have empirically outperformed gradient-based ones in prior work.
- The paper is available on arXiv with ID 2605.04568.
- The approach targets high-dimensional control tasks.
Entities
Institutions
- arXiv