Q-Flow: Stable RL with Flow-Based Policy

ai-technology · 2026-05-14

A new reinforcement learning framework called Q-Flow leverages flow-based models for decision-making policies. It addresses instability in gradient-based optimization by propagating terminal trajectory value to intermediate latent states through deterministic flow dynamics. This eliminates the need for backpropagating through numerical solvers, enabling stable policy optimization without sacrificing expressive capacity. The approach resolves the trade-off between optimization stability and representational flexibility in existing methods.

Key facts

Q-Flow is a reinforcement learning framework using flow-based models as policies.
It propagates terminal trajectory value to intermediate latent states via flow dynamics.
The method avoids backpropagating through numerical solvers.
It enables stable policy optimization with intermediate value gradients.
Existing approaches restrict expressive capacity for stability.
Q-Flow bridges the gap between optimization stability and representational flexibility.
The paper is published on arXiv with ID 2605.13435.
The approach is deterministic in nature.

Q-Flow: Stable RL with Flow-Based Policy

Key facts

Entities

Institutions

Sources