ARTFEED — Contemporary Art Intelligence

Drifting Field Policy: One-Step Generative AI for Robotics

ai-technology · 2026-05-11

A new approach called Drifting Field Policy (DFP) has been introduced by researchers as a non-ODE one-step generative policy for robotic manipulation. This method conceptualizes policy updates as a reverse-KL Wasserstein-2 gradient flow leading to a soft target policy. It breaks down into two components: ascending towards regions with higher action values and score matching with a reference policy. A manageable surrogate loss, similar to behavior cloning, is formulated based on the top-K actions selected by critics. DFP demonstrates superior performance on manipulation tasks in Robomimic and OGBench, surpassing the results of ODE-based policies.

Key facts

  • DFP is a non-ODE one-step generative policy.
  • Policy update is a reverse-KL Wasserstein-2 gradient flow.
  • Gradient decomposes into ascent toward higher action-value regions and score matching.
  • Surrogate loss is akin to behavior cloning on top-K critic-selected actions.
  • DFP achieves state-of-the-art on Robomimic and OGBench.
  • Outperforms ODE-based policies.
  • One-step inference.
  • Built on drifting model paradigm.

Entities

Institutions

  • arXiv

Sources