Fast Rate Guarantees for Inverse Reinforcement Learning
A recent theoretical study introduces rapid statistical rates for entropy-regularized min-max inverse reinforcement learning (Min-Max-IRL) within finite-horizon MDPs featuring linear reward classes. The researchers demonstrate that, at the population level and with deterministic dynamics, maximum likelihood estimation aligns with Min-Max-IRL. By leveraging the pseudo-self-concordance of the Min-Max-IRL loss, they reveal that both trajectory-level KL divergence and squared parameter error diminish at a rate of O(1/n), where n represents the number of expert trajectories. These findings are applicable even under misspecification and do not necessitate exploration assumptions. Additionally, the research broadens the scope of reward identifiability to general Borel spaces and presents new insights regarding the derivatives of the soft-optimal value function.
Key facts
- arXiv:2605.14599v1
- Entropy-regularized min-max inverse reinforcement learning
- Linear reward classes
- Finite-horizon MDPs with Borel state and action spaces
- MLE and Min-Max-IRL equivalence at population level and under deterministic dynamics
- Fast rate O(n^{-1}) for KL divergence and parameter error
- No exploration assumptions required
- Results apply under misspecification
Entities
Institutions
- arXiv