SAS: Test-Time Safety Adaptation for Offline RL
A team of researchers has introduced SAS (Self-Alignment for Safety), a transformer-based framework designed for offline safe reinforcement learning that adjusts during testing without the need for retraining. The approach employs self-alignment, where the agent creates hypothetical trajectories, identifies those that meet the Lyapunov condition, and uses them as in-context prompts to adjust its behavior towards safety. This process transforms Lyapunov-guided imagination into control-invariant prompts, interpreted hierarchically in reinforcement learning as Bayesian inference over latent skills. Tested on Safety Gymnasium and MuJoCo benchmarks, SAS consistently lowers costs and enhances safety.
Key facts
- SAS is a transformer-based framework for offline safe RL.
- It enables test-time adaptation without retraining.
- Self-alignment mechanism generates imagined trajectories and selects Lyapunov-satisfying ones.
- Selected trajectories are used as in-context prompts.
- No parameter updates are needed during adaptation.
- Framework admits hierarchical RL interpretation.
- Tested on Safety Gymnasium and MuJoCo benchmarks.
- Consistently reduces cost and improves safety.
Entities
—