SENIOR: Efficient Query Selection and Preference-Guided Exploration in PbRL
A new method called SENIOR improves feedback and sample efficiency in preference-based reinforcement learning. It uses a Motion-Distinction-based Selection scheme (MDS) to pick behavior segment pairs with clear motion and distinct directions, making them easier for human labeling. A preference-guided exploration method (PGE) accelerates policy learning via intrinsic rewards. The approach addresses key bottlenecks in PbRL applications.
Key facts
- SENIOR is a method for preference-based reinforcement learning.
- It improves human feedback efficiency and sample efficiency.
- MDS selects segment pairs with apparent motion and different directions.
- MDS uses kernel density estimation of states.
- PGE is a preference-guided exploration method.
- PGE encourages exploration using intrinsic rewards.
- The paper is from arXiv:2506.14648v2.
- The method avoids reward engineering.
Entities
Institutions
- arXiv