Drifting Field Policy: One-Step Generative AI for Robotics
A new approach called Drifting Field Policy (DFP) has been introduced by researchers as a non-ODE one-step generative policy for robotic manipulation. This method conceptualizes policy updates as a reverse-KL Wasserstein-2 gradient flow leading to a soft target policy. It breaks down into two components: ascending towards regions with higher action values and score matching with a reference policy. A manageable surrogate loss, similar to behavior cloning, is formulated based on the top-K actions selected by critics. DFP demonstrates superior performance on manipulation tasks in Robomimic and OGBench, surpassing the results of ODE-based policies.
Key facts
- DFP is a non-ODE one-step generative policy.
- Policy update is a reverse-KL Wasserstein-2 gradient flow.
- Gradient decomposes into ascent toward higher action-value regions and score matching.
- Surrogate loss is akin to behavior cloning on top-K critic-selected actions.
- DFP achieves state-of-the-art on Robomimic and OGBench.
- Outperforms ODE-based policies.
- One-step inference.
- Built on drifting model paradigm.
Entities
Institutions
- arXiv