SENIOR: Efficient Query Selection and Preference-Guided Exploration in PbRL

other · 2026-05-23

A new method called SENIOR improves feedback and sample efficiency in preference-based reinforcement learning. It uses a Motion-Distinction-based Selection scheme (MDS) to pick behavior segment pairs with clear motion and distinct directions, making them easier for human labeling. A preference-guided exploration method (PGE) accelerates policy learning via intrinsic rewards. The approach addresses key bottlenecks in PbRL applications.

Key facts

SENIOR is a method for preference-based reinforcement learning.
It improves human feedback efficiency and sample efficiency.
MDS selects segment pairs with apparent motion and different directions.
MDS uses kernel density estimation of states.
PGE is a preference-guided exploration method.
PGE encourages exploration using intrinsic rewards.
The paper is from arXiv:2506.14648v2.
The method avoids reward engineering.

SENIOR: Efficient Query Selection and Preference-Guided Exploration in PbRL

Key facts

Entities

Institutions

Sources