ARTFEED — Contemporary Art Intelligence

SAS: Test-Time Safety Adaptation for Offline RL

ai-technology · 2026-04-30

A team of researchers has introduced SAS (Self-Alignment for Safety), a transformer-based framework designed for offline safe reinforcement learning that adjusts during testing without the need for retraining. The approach employs self-alignment, where the agent creates hypothetical trajectories, identifies those that meet the Lyapunov condition, and uses them as in-context prompts to adjust its behavior towards safety. This process transforms Lyapunov-guided imagination into control-invariant prompts, interpreted hierarchically in reinforcement learning as Bayesian inference over latent skills. Tested on Safety Gymnasium and MuJoCo benchmarks, SAS consistently lowers costs and enhances safety.

Key facts

  • SAS is a transformer-based framework for offline safe RL.
  • It enables test-time adaptation without retraining.
  • Self-alignment mechanism generates imagined trajectories and selects Lyapunov-satisfying ones.
  • Selected trajectories are used as in-context prompts.
  • No parameter updates are needed during adaptation.
  • Framework admits hierarchical RL interpretation.
  • Tested on Safety Gymnasium and MuJoCo benchmarks.
  • Consistently reduces cost and improves safety.

Entities

Sources