FP-IRL: Physics-Constrained Inverse Reinforcement Learning for Unknown Dynamics
A new approach called Fokker-Planck inverse reinforcement learning (FP-IRL) has been developed for systems that operate under Fokker-Planck dynamics. Unlike standard inverse reinforcement learning methods, which require knowledge of the transition function beforehand, FP-IRL can determine both the reward and transition functions solely from trajectory data, making sampled transitions unnecessary. This technique leverages a mathematical link between Markov decision processes (MDPs) and the Fokker-Planck equation, relating reward optimization in MDPs to free energy minimization. This method is particularly useful for addressing challenges when the underlying dynamics are either unknown or unobservable. The research has been shared on arXiv under the reference 2306.10407v3.
Key facts
- FP-IRL stands for Fokker-Planck inverse reinforcement learning.
- It is a physics-constrained IRL framework.
- It infers reward and transition functions simultaneously from trajectory data.
- No access to sampled transitions is required.
- It applies to systems described by Fokker-Planck dynamics.
- It links reward maximization in MDPs with free energy minimization.
- Conventional IRL methods need the transition function prescribed or estimated a priori.
- The paper is available on arXiv with ID 2306.10407v3.
Entities
Institutions
- arXiv