DOSER: Diffusion-Based OOD Detection for Offline RL

other · 2026-05-12

A new framework called DOSER (Diffusion-based OOD Detection and Selective Regularization) addresses overestimation of out-of-distribution actions in offline reinforcement learning. Unlike existing methods that uniformly penalize unseen samples, DOSER uses two diffusion models to capture behavior policy and state distribution, employing single-step denoising reconstruction error as an OOD indicator. It distinguishes between beneficial and detrimental OOD actions during policy optimization, avoiding suppression of useful exploration. The approach is detailed in arXiv:2605.08202.

Key facts

DOSER stands for Diffusion-based OOD Detection and Selective Regularization.
It uses two diffusion models for behavior policy and state distribution.
Single-step denoising reconstruction error serves as OOD indicator.
It distinguishes beneficial from detrimental OOD actions.
Addresses overestimation of OOD actions in offline RL.
Published on arXiv with ID 2605.08202.
Proposed as an alternative to uniform penalization methods.
Aims to avoid suppressing beneficial exploration.

Entities

—

Sources

arXiv cs.AI — 2026-05-12