Universal Horizon Models Improve Offline Reinforcement Learning

other · 2026-05-18

A new paper on arXiv introduces universal horizon models (UHM), a generalization of geometric horizon models (GHM) for offline reinforcement learning. UHM directly predicts future states under arbitrary horizons, addressing compounding errors from repeated model inference. The proposed value learning method uses a winsorized horizon distribution to stabilize training. Experiments on 100 OGBench tasks show UHM outperforms baselines, especially on tasks with highly suboptimal data.

Key facts

arXiv:2605.15603v1 introduces universal horizon models (UHM).
UHM generalizes geometric horizon models (GHM) for offline RL.
UHM predicts future states under arbitrary horizons.
Method uses winsorized horizon distribution to stabilize training.
Experiments conducted on 100 OGBench tasks.
UHM outperforms competitive baselines.
Particularly effective on tasks with highly suboptimal data.
Model-based RL suffers from compounding errors due to repeated model inference.

Universal Horizon Models Improve Offline Reinforcement Learning

Key facts

Entities

Institutions

Sources