ARTFEED — Contemporary Art Intelligence

DOSER: Diffusion-Based OOD Detection for Offline RL

other · 2026-05-12

A new framework called DOSER (Diffusion-based OOD Detection and Selective Regularization) addresses overestimation of out-of-distribution actions in offline reinforcement learning. Unlike existing methods that uniformly penalize unseen samples, DOSER uses two diffusion models to capture behavior policy and state distribution, employing single-step denoising reconstruction error as an OOD indicator. It distinguishes between beneficial and detrimental OOD actions during policy optimization, avoiding suppression of useful exploration. The approach is detailed in arXiv:2605.08202.

Key facts

  • DOSER stands for Diffusion-based OOD Detection and Selective Regularization.
  • It uses two diffusion models for behavior policy and state distribution.
  • Single-step denoising reconstruction error serves as OOD indicator.
  • It distinguishes beneficial from detrimental OOD actions.
  • Addresses overestimation of OOD actions in offline RL.
  • Published on arXiv with ID 2605.08202.
  • Proposed as an alternative to uniform penalization methods.
  • Aims to avoid suppressing beneficial exploration.

Entities

Sources