ARTFEED — Contemporary Art Intelligence

Score-Based One-step MeanFlow Policy Optimization

other · 2026-05-25

The Score-Based One-step MeanFlow Policy Optimization (SOM) represents an innovative actor-critic approach in reinforcement learning. This algorithm tackles the computational demands associated with diffusion and flow matching policies by establishing a direct one-step mapping from noise to data. By utilizing score estimation and a probability flow ODE, SOM derives the target velocity field straight from the Q-function, thereby removing the necessity for samples from the target distribution. In the realm of online reinforcement learning, SOM demonstrates leading performance in locomotion tasks, accomplishing this with just a single generation step.

Key facts

  • SOM is an actor-critic algorithm for reinforcement learning.
  • It uses a single-step mapping from noise to data.
  • The target velocity field is constructed from the Q-function via score estimation and a probability flow ODE.
  • SOM eliminates the need for samples from the target distribution.
  • It achieves state-of-the-art performance on locomotion tasks in online RL.
  • SOM requires only a single generation step.
  • The method is based on MeanFlow, which learns an average velocity field.
  • The paper is published on arXiv with ID 2605.23365.

Entities

Institutions

  • arXiv

Sources