Drifting Field Policy: One-Step Generative AI for Robotics

ai-technology · 2026-05-11

A new approach called Drifting Field Policy (DFP) has been introduced by researchers as a non-ODE one-step generative policy for robotic manipulation. This method conceptualizes policy updates as a reverse-KL Wasserstein-2 gradient flow leading to a soft target policy. It breaks down into two components: ascending towards regions with higher action values and score matching with a reference policy. A manageable surrogate loss, similar to behavior cloning, is formulated based on the top-K actions selected by critics. DFP demonstrates superior performance on manipulation tasks in Robomimic and OGBench, surpassing the results of ODE-based policies.

Key facts

DFP is a non-ODE one-step generative policy.
Policy update is a reverse-KL Wasserstein-2 gradient flow.
Gradient decomposes into ascent toward higher action-value regions and score matching.
Surrogate loss is akin to behavior cloning on top-K critic-selected actions.
DFP achieves state-of-the-art on Robomimic and OGBench.
Outperforms ODE-based policies.
One-step inference.
Built on drifting model paradigm.

Drifting Field Policy: One-Step Generative AI for Robotics

Key facts

Entities

Institutions

Sources