Universal Horizon Models Improve Offline Reinforcement Learning
A new paper on arXiv introduces universal horizon models (UHM), a generalization of geometric horizon models (GHM) for offline reinforcement learning. UHM directly predicts future states under arbitrary horizons, addressing compounding errors from repeated model inference. The proposed value learning method uses a winsorized horizon distribution to stabilize training. Experiments on 100 OGBench tasks show UHM outperforms baselines, especially on tasks with highly suboptimal data.
Key facts
- arXiv:2605.15603v1 introduces universal horizon models (UHM).
- UHM generalizes geometric horizon models (GHM) for offline RL.
- UHM predicts future states under arbitrary horizons.
- Method uses winsorized horizon distribution to stabilize training.
- Experiments conducted on 100 OGBench tasks.
- UHM outperforms competitive baselines.
- Particularly effective on tasks with highly suboptimal data.
- Model-based RL suffers from compounding errors due to repeated model inference.
Entities
Institutions
- arXiv