KL-Regularization Framework Enhances MPPI Planning in MBRL
A fresh approach has emerged for model-based reinforcement learning (MBRL) that addresses the challenges of exploring in complex continuous control tasks. Recent innovations have started to use learned policies as proposal distributions in Model-Predictive Path Integral (MPPI) planning. Previously, methods updated the sampling policy on its own, concentrating on optimizing a learned value function using deterministic policy gradient and entropy regularization. However, aligning the sampling policy more closely with the planner can improve both the accuracy of value estimation and overall long-term results. New approaches aim to either reduce KL divergence between the sampling policy and planner distribution or include planner-guided regularization. This study integrates these MPPI strategies into a KL-regularization framework with adaptive priors.
Key facts
- The framework targets model-based reinforcement learning (MBRL).
- It focuses on high-dimensional continuous control tasks.
- Learned policies are used as proposal distributions for MPPI planning.
- Initial methods update sampling policy independently of planner.
- Aligning sampling policy with planner improves value estimation.
- Recent methods minimize KL divergence to planner distribution.
- Planner-guided regularization is introduced in policy updates.
- This work unifies MPPI-based approaches with adaptive priors.
Entities
Institutions
- arXiv