Search Can Harm Model-Based RL Performance
A recent study disputes the widely held belief that long-term forecasting and compounding errors are the main challenges in model-based reinforcement learning (RL). The researchers discovered that search cannot simply substitute for a learned policy and may even degrade performance, despite a model's high accuracy. They emphasize that reducing overestimation bias is more crucial than enhancing the accuracy of the model or value function. By utilizing the minimum from a collection of value functions, this bias can be effectively mitigated, facilitating successful search and attaining top-tier performance in various benchmark domains.
Key facts
- Search can harm performance even with a highly accurate model.
- Mitigating overestimation bias is more important than model accuracy.
- Taking the minimum over an ensemble of value functions addresses bias.
- Achieves state-of-the-art performance across multiple benchmark domains.
- Challenges conventional wisdom about model-based RL obstacles.
Entities
Institutions
- arXiv