Sampling-Based Safe Reinforcement Learning Algorithm
A team of researchers has introduced Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based reinforcement learning algorithm designed to maintain safety constraints within a limited range of dynamic samples, facilitating secure exploration in continuous environments. This approach estimates worst-case scenarios over uncertain dynamics and leverages epistemic uncertainty to direct exploration without the need for explicit rewards. The theoretical framework ensures high-probability safety during the learning process and establishes finite-time sample complexity for recovering near-optimal policies. Empirical evaluations demonstrate both safe and efficient exploration in simulations as well as on actual robotic systems.
Key facts
- SBSRL is a model-based RL algorithm.
- It maintains safety by enforcing constraints across dynamics samples.
- The method approximates worst-case optimization over uncertain dynamics.
- Exploration is guided by constraining epistemic uncertainty.
- High-probability safety guarantees are derived under regularity conditions.
- Finite-time sample complexity bound is provided for near-optimal policy recovery.
- Empirical validation includes simulation and real robotic hardware.
- The paper is available on arXiv with ID 2605.19469.
Entities
Institutions
- arXiv