ARTFEED — Contemporary Art Intelligence

Inverse Reinforcement Learning with Suboptimal Demonstrators

other · 2026-06-01

Researchers propose a feasible-reward-set framework for inverse reinforcement learning (IRL) when demonstrations come from multiple imperfect demonstrators with varying suboptimality levels. Instead of assuming a single optimal demonstrator, the method encodes each demonstrator's declared suboptimality as a linear constraint and intersects feasible sets across demonstrators. Theoretical analysis shows the joint feasible set shrinks monotonically with added data, and conditions are given for when a new demonstrator strictly tightens it. Two recovery guarantees for the ground-truth optimal reward set are established: one depends on proximity to optimal occupancy, the other requires sufficient coverage without a near-optimal demonstrator. Practical strategies are introduced to handle unknown suboptimality levels.

Key facts

  • IRL typically assumes a single optimal demonstrator
  • New framework handles multiple imperfect demonstrators with heterogeneous suboptimality
  • Each demonstrator's suboptimality level encoded as a linear constraint
  • Joint feasible set shrinks monotonically as data are added
  • Exact characterization of when a new demonstrator tightens the set
  • Two recovery guarantees for the ground-truth optimal reward set
  • One guarantee depends on closeness to optimal occupancy
  • Other guarantee requires sufficient coverage and no near-optimal demonstrator

Entities

Sources