Search Can Harm Model-Based RL Performance

publication · 2026-05-25

A recent study disputes the widely held belief that long-term forecasting and compounding errors are the main challenges in model-based reinforcement learning (RL). The researchers discovered that search cannot simply substitute for a learned policy and may even degrade performance, despite a model's high accuracy. They emphasize that reducing overestimation bias is more crucial than enhancing the accuracy of the model or value function. By utilizing the minimum from a collection of value functions, this bias can be effectively mitigated, facilitating successful search and attaining top-tier performance in various benchmark domains.

Key facts

Search can harm performance even with a highly accurate model.
Mitigating overestimation bias is more important than model accuracy.
Taking the minimum over an ensemble of value functions addresses bias.
Achieves state-of-the-art performance across multiple benchmark domains.
Challenges conventional wisdom about model-based RL obstacles.

Search Can Harm Model-Based RL Performance

Key facts

Entities

Institutions

Sources