ARTFEED — Contemporary Art Intelligence

Search Can Harm Model-Based RL Performance

publication · 2026-05-25

A recent study disputes the widely held belief that long-term forecasting and compounding errors are the main challenges in model-based reinforcement learning (RL). The researchers discovered that search cannot simply substitute for a learned policy and may even degrade performance, despite a model's high accuracy. They emphasize that reducing overestimation bias is more crucial than enhancing the accuracy of the model or value function. By utilizing the minimum from a collection of value functions, this bias can be effectively mitigated, facilitating successful search and attaining top-tier performance in various benchmark domains.

Key facts

  • Search can harm performance even with a highly accurate model.
  • Mitigating overestimation bias is more important than model accuracy.
  • Taking the minimum over an ensemble of value functions addresses bias.
  • Achieves state-of-the-art performance across multiple benchmark domains.
  • Challenges conventional wisdom about model-based RL obstacles.

Entities

Institutions

  • arXiv

Sources