ARTFEED — Contemporary Art Intelligence

Crowd Preferences Reveal Shared Safety Criteria for RL

ai-technology · 2026-05-23

A recent preprint on arXiv (2605.21822) presents Safe Crowd Preference-based RL (SCP-RL), a hierarchical system designed to derive common safety standards from crowd preference data. The researchers highlight the drawbacks of directly combining rewards—optimizing a reward model based on preferences with those of downstream tasks. In contrast, SCP-RL identifies safety-oriented skills from crowd preferences and integrates them through a high-level policy to address downstream tasks safely. Validation of this method comes from experiments in safe RL settings and an initial LLM-style task. The study emphasizes shared safety principles within crowd preferences, noting that while users may have varying goals, they often adhere to similar safety protocols.

Key facts

  • arXiv paper 2605.21822 proposes Safe Crowd Preference-based RL (SCP-RL)
  • SCP-RL extracts shared safety criteria from crowd preference datasets
  • Direct reward combination has inherent limitations for safety alignment
  • Hierarchical framework extracts safety-aligned skills from crowd preferences
  • Skills are composed via a high-level policy for downstream tasks
  • Experiments conducted in safe RL environments and LLM-style tasks
  • Crowd preferences contain common safety principles despite diverse user objectives
  • Method transfers safety criteria from crowd data to downstream RL tasks

Entities

Institutions

  • arXiv

Sources