Preference-based Constrained Reinforcement Learning for Safety

other · 2026-05-25

The recently introduced Preference-based Constrained Reinforcement Learning (PbCRL) tackles the issue of deriving safety constraints in reinforcement learning based on human preferences. Conventional Bradley-Terry models struggle to account for the asymmetric and heavy-tailed characteristics of safety costs, resulting in an underestimation of risk. PbCRL presents a more efficient solution that avoids limiting assumptions and the need for extensive expert demonstrations, enhancing its relevance to practical applications. This research, detailed in arXiv (2603.23565), emphasizes the cost-effective and dependable learning of intricate, subjective, and difficult-to-define safety constraints.

Key facts

PbCRL is a novel approach for safe reinforcement learning.
It infers safety constraints from human preferences.
Bradley-Terry models underestimate risk due to asymmetric safety costs.
The method does not require extensive expert demonstrations.
The paper is available on arXiv with ID 2603.23565.
It addresses the challenge of specifying complex real-world constraints.

Preference-based Constrained Reinforcement Learning for Safety

Key facts

Entities

Institutions

Sources