Risk-Gated RL for Safety-Critical Control Under Partial Observability
A novel reinforcement learning technique known as action-conditioned risk gating tackles the challenge of risk-sensitive control within partially observable Markov decision processes. This method employs a concise finite-history proxy state and develops a predictor for imminent safety violations. The anticipated risk functions as a penalty during the value learning phase and acts as a decision-time gate, balancing between optimistic and conservative ensemble value assessments, which allows for the efficient evaluation of low-risk actions. The design aims to lower both computational expenses and model sensitivity in comparison to belief-space planning.
Key facts
- Method targets risk-sensitive partially observable Markov decision processes.
- Uses compact finite-history proxy state.
- Learns action-conditioned predictor of near-term safety violation.
- Predicted risk used as penalty in value learning.
- Predicted risk used as decision-time gate between optimistic and conservative estimates.
- Aims to reduce computational cost and model sensitivity.
- Described in arXiv paper 2605.14246.
- Published on arXiv.
Entities
Institutions
- arXiv