Risk-Gated RL for Safety-Critical Control Under Partial Observability

other · 2026-05-16

A novel reinforcement learning technique known as action-conditioned risk gating tackles the challenge of risk-sensitive control within partially observable Markov decision processes. This method employs a concise finite-history proxy state and develops a predictor for imminent safety violations. The anticipated risk functions as a penalty during the value learning phase and acts as a decision-time gate, balancing between optimistic and conservative ensemble value assessments, which allows for the efficient evaluation of low-risk actions. The design aims to lower both computational expenses and model sensitivity in comparison to belief-space planning.

Key facts

Method targets risk-sensitive partially observable Markov decision processes.
Uses compact finite-history proxy state.
Learns action-conditioned predictor of near-term safety violation.
Predicted risk used as penalty in value learning.
Predicted risk used as decision-time gate between optimistic and conservative estimates.
Aims to reduce computational cost and model sensitivity.
Described in arXiv paper 2605.14246.
Published on arXiv.

Risk-Gated RL for Safety-Critical Control Under Partial Observability

Key facts

Entities

Institutions

Sources