Crowd Preferences Reveal Shared Safety Criteria for RL

ai-technology · 2026-05-23

A recent preprint on arXiv (2605.21822) presents Safe Crowd Preference-based RL (SCP-RL), a hierarchical system designed to derive common safety standards from crowd preference data. The researchers highlight the drawbacks of directly combining rewards—optimizing a reward model based on preferences with those of downstream tasks. In contrast, SCP-RL identifies safety-oriented skills from crowd preferences and integrates them through a high-level policy to address downstream tasks safely. Validation of this method comes from experiments in safe RL settings and an initial LLM-style task. The study emphasizes shared safety principles within crowd preferences, noting that while users may have varying goals, they often adhere to similar safety protocols.

Key facts

arXiv paper 2605.21822 proposes Safe Crowd Preference-based RL (SCP-RL)
SCP-RL extracts shared safety criteria from crowd preference datasets
Direct reward combination has inherent limitations for safety alignment
Hierarchical framework extracts safety-aligned skills from crowd preferences
Skills are composed via a high-level policy for downstream tasks
Experiments conducted in safe RL environments and LLM-style tasks
Crowd preferences contain common safety principles despite diverse user objectives
Method transfers safety criteria from crowd data to downstream RL tasks

Crowd Preferences Reveal Shared Safety Criteria for RL

Key facts

Entities

Institutions

Sources