ARTFEED — Contemporary Art Intelligence

Decoupled Guidance Diffusion for Adaptive Offline Safe RL

other · 2026-05-07

A new method called Safe Decoupled Guidance Diffusion (SDGD) has been developed for offline safe reinforcement learning, enabling policies to adapt based on varying safety budgets during deployment. It treats the generation of safe trajectories as sampling from a distribution limited by constraints, where the budget sets the trajectory boundaries and the reward shapes preferences within those limits. SDGD uses classifier-free guidance tied to cost limits to steer sampling towards compliant trajectories, while also applying reward-gradient guidance to enhance trajectories for better outcomes. This method addresses the issue with existing guidance strategies that view reward improvement and safety constraints as opposing goals, potentially risking safety under budget limits. You can find more about this in the paper on arXiv with ID 2605.02777v2.

Key facts

  • Method is called Safe Decoupled Guidance Diffusion (SDGD)
  • Designed for offline safe reinforcement learning
  • Allows adaptation to varying safety budgets at deployment time
  • Reinterprets trajectory generation as sampling from constrained distribution
  • Uses classifier-free guidance conditioned on cost limit
  • Uses reward-gradient guidance for higher return
  • Addresses unreliability of competing gradient objectives
  • Paper available on arXiv with ID 2605.02777v2

Entities

Institutions

  • arXiv

Sources