DOSER: Diffusion-Based OOD Detection for Offline RL
A new framework called DOSER (Diffusion-based OOD Detection and Selective Regularization) addresses overestimation of out-of-distribution actions in offline reinforcement learning. Unlike existing methods that uniformly penalize unseen samples, DOSER uses two diffusion models to capture behavior policy and state distribution, employing single-step denoising reconstruction error as an OOD indicator. It distinguishes between beneficial and detrimental OOD actions during policy optimization, avoiding suppression of useful exploration. The approach is detailed in arXiv:2605.08202.
Key facts
- DOSER stands for Diffusion-based OOD Detection and Selective Regularization.
- It uses two diffusion models for behavior policy and state distribution.
- Single-step denoising reconstruction error serves as OOD indicator.
- It distinguishes beneficial from detrimental OOD actions.
- Addresses overestimation of OOD actions in offline RL.
- Published on arXiv with ID 2605.08202.
- Proposed as an alternative to uniform penalization methods.
- Aims to avoid suppressing beneficial exploration.
Entities
—