Inverse Reinforcement Learning with Suboptimal Demonstrators

other · 2026-06-01

Researchers propose a feasible-reward-set framework for inverse reinforcement learning (IRL) when demonstrations come from multiple imperfect demonstrators with varying suboptimality levels. Instead of assuming a single optimal demonstrator, the method encodes each demonstrator's declared suboptimality as a linear constraint and intersects feasible sets across demonstrators. Theoretical analysis shows the joint feasible set shrinks monotonically with added data, and conditions are given for when a new demonstrator strictly tightens it. Two recovery guarantees for the ground-truth optimal reward set are established: one depends on proximity to optimal occupancy, the other requires sufficient coverage without a near-optimal demonstrator. Practical strategies are introduced to handle unknown suboptimality levels.

Key facts

IRL typically assumes a single optimal demonstrator
New framework handles multiple imperfect demonstrators with heterogeneous suboptimality
Each demonstrator's suboptimality level encoded as a linear constraint
Joint feasible set shrinks monotonically as data are added
Exact characterization of when a new demonstrator tightens the set
Two recovery guarantees for the ground-truth optimal reward set
One guarantee depends on closeness to optimal occupancy
Other guarantee requires sufficient coverage and no near-optimal demonstrator

Entities

—

Sources

arXiv cs.AI — 2026-06-01