ARTFEED — Contemporary Art Intelligence

Risk-Gated RL for Safety-Critical Control Under Partial Observability

other · 2026-05-16

A novel reinforcement learning technique known as action-conditioned risk gating tackles the challenge of risk-sensitive control within partially observable Markov decision processes. This method employs a concise finite-history proxy state and develops a predictor for imminent safety violations. The anticipated risk functions as a penalty during the value learning phase and acts as a decision-time gate, balancing between optimistic and conservative ensemble value assessments, which allows for the efficient evaluation of low-risk actions. The design aims to lower both computational expenses and model sensitivity in comparison to belief-space planning.

Key facts

  • Method targets risk-sensitive partially observable Markov decision processes.
  • Uses compact finite-history proxy state.
  • Learns action-conditioned predictor of near-term safety violation.
  • Predicted risk used as penalty in value learning.
  • Predicted risk used as decision-time gate between optimistic and conservative estimates.
  • Aims to reduce computational cost and model sensitivity.
  • Described in arXiv paper 2605.14246.
  • Published on arXiv.

Entities

Institutions

  • arXiv

Sources