EfficientTDMPC Boosts Sample Efficiency in Continuous Control
EfficientTDMPC is a new model-based reinforcement learning method for continuous control, built on the TD-MPC family. It improves sample efficiency by reducing estimation errors in the planner's return objective through an ensemble of dynamics models and uncertainty penalties. The method also includes practical improvements for data freshness and compute efficiency, and benefits from higher update-to-data ratios.
Key facts
- EfficientTDMPC is a model-based reinforcement learning method for continuous control.
- It is built on the TD-MPC family of algorithms.
- It uses an ensemble of dynamics models to average return estimates.
- It applies an uncertainty penalty to avoid actions with uncertain returns.
- It includes practical improvements for buffer data freshness and reduced compute.
- It benefits from a higher update-to-data (UTD) ratio.
- The method aims to reduce error from learned models and value networks.
- It was introduced in arXiv paper 2605.16692.
Entities
—