GORMPO: Generative OOD-Regularized Model-Based Policy Optimization for Offline RL

other · 2026-05-26

A new offline reinforcement learning algorithm, Generative OOD-regularized Model-based Policy Optimization (GORMPO), is introduced to address out-of-distribution (OOD) actions in sparse state-action spaces. The method integrates generative density estimation into model-based RL to restrict policy updates to high-density regions of the dataset, ensuring safer offline policies. The study compares OOD detection capabilities of various density estimators and their performance within the algorithm. The paper is published on arXiv with ID 2605.24405.

Key facts

GORMPO is a density-regularized offline RL algorithm.
It uses generative density modeling to avoid OOD actions.
The method targets sparse state-action spaces.
It compares OOD detection of different density estimators.
The paper is available on arXiv.
arXiv ID: 2605.24405.
The approach aims to ensure safe offline policies.
The study explores integration of density estimation into model-based RL.

GORMPO: Generative OOD-Regularized Model-Based Policy Optimization for Offline RL

Key facts

Entities

Institutions

Sources