Inverse Reinforcement Learning with Suboptimal Demonstrators
Researchers propose a feasible-reward-set framework for inverse reinforcement learning (IRL) when demonstrations come from multiple imperfect demonstrators with varying suboptimality levels. Instead of assuming a single optimal demonstrator, the method encodes each demonstrator's declared suboptimality as a linear constraint and intersects feasible sets across demonstrators. Theoretical analysis shows the joint feasible set shrinks monotonically with added data, and conditions are given for when a new demonstrator strictly tightens it. Two recovery guarantees for the ground-truth optimal reward set are established: one depends on proximity to optimal occupancy, the other requires sufficient coverage without a near-optimal demonstrator. Practical strategies are introduced to handle unknown suboptimality levels.
Key facts
- IRL typically assumes a single optimal demonstrator
- New framework handles multiple imperfect demonstrators with heterogeneous suboptimality
- Each demonstrator's suboptimality level encoded as a linear constraint
- Joint feasible set shrinks monotonically as data are added
- Exact characterization of when a new demonstrator tightens the set
- Two recovery guarantees for the ground-truth optimal reward set
- One guarantee depends on closeness to optimal occupancy
- Other guarantee requires sufficient coverage and no near-optimal demonstrator
Entities
—