SAS: Test-Time Safety Adaptation for Offline RL

ai-technology · 2026-04-30

A team of researchers has introduced SAS (Self-Alignment for Safety), a transformer-based framework designed for offline safe reinforcement learning that adjusts during testing without the need for retraining. The approach employs self-alignment, where the agent creates hypothetical trajectories, identifies those that meet the Lyapunov condition, and uses them as in-context prompts to adjust its behavior towards safety. This process transforms Lyapunov-guided imagination into control-invariant prompts, interpreted hierarchically in reinforcement learning as Bayesian inference over latent skills. Tested on Safety Gymnasium and MuJoCo benchmarks, SAS consistently lowers costs and enhances safety.

Key facts

SAS is a transformer-based framework for offline safe RL.
It enables test-time adaptation without retraining.
Self-alignment mechanism generates imagined trajectories and selects Lyapunov-satisfying ones.
Selected trajectories are used as in-context prompts.
No parameter updates are needed during adaptation.
Framework admits hierarchical RL interpretation.
Tested on Safety Gymnasium and MuJoCo benchmarks.
Consistently reduces cost and improves safety.

Entities

—

Sources

arXiv cs.AI — 2026-04-30