MDPO: Stochastic Exploration for Differentiable Planning

other · 2026-05-11

The Model-Driven Policy Optimization (MDPO) framework presents a novel approach by incorporating stochastic exploration into differentiable planning through the addition of noise within the action space during the optimization process. It tailors the noise level according to the sensitivity of the trajectory objective derived from gradients, resulting in a dynamic exploration profile over time. This strategy aids in avoiding suboptimal local optima in complex nonlinear and hybrid discrete-continuous environments. Tests conducted in benchmark domains reveal enhanced optimization landscapes.

Key facts

Differentiable planning uses gradient-based optimization of decision-making problems.
Nonlinear and hybrid discrete-continuous domains often have ill-conditioned optimization landscapes.
MDPO injects noise into the action space during optimization.
Noise magnitude is adapted based on gradient-derived sensitivity of the trajectory objective.
MDPO yields a time-dependent exploration profile.
Dynamic allocation of exploration across timesteps and iterations helps escape poor local optima.
Experiments were conducted on benchmark domains.
The paper is available on arXiv with ID 2605.07520.

MDPO: Stochastic Exploration for Differentiable Planning

Key facts

Entities

Institutions

Sources