Stochastic MeanFlow Policies: One-Step Generative Control in RL
The paper introduces Stochastic MeanFlow Policies (SMFP), a one-step generative policy class for reinforcement learning that combines entropy regularization with mirror descent constraints. SMFP addresses limitations of Gaussian policies in handling multimodal action distributions and avoids iterative sampling required by other generative policies. The approach unifies soft policy improvement and mirror descent by minimizing different KL divergences, enabling exploration while stabilizing policy updates. The method is presented as a solution for online off-policy RL, offering tractable entropy estimates and expressive power without iterative sampling.
Key facts
- SMFP is a one-step generative policy class for reinforcement learning.
- It combines entropy regularization with mirror descent constraints.
- Gaussian policies struggle with multimodal action distributions.
- SMFP avoids iterative sampling required by other generative policies.
- The method unifies soft policy improvement and mirror descent.
- It supports exploration while stabilizing policy improvement.
- The paper is published on arXiv with ID 2605.21282v2.
- The approach offers tractable entropy estimates.
Entities
—