Dynamic-TreeRPO: Structured Sampling for RL in T2I Generation

ai-technology · 2026-05-18

Dynamic-TreeRPO introduces a groundbreaking technique that combines reinforcement learning with flow matching models for generating images from text. It tackles the limitations of exhaustive search methods by employing a sliding-window sampling approach organized as a tree-structured search, which features dynamic noise levels. Within this framework, GRPO-guided optimization and constrained SDE sampling are utilized, allowing for shared prefix paths that reduce computational demands. This innovative design increases diversity in outputs without incurring additional costs, thereby enhancing the overall quality of the generated images.

Key facts

Dynamic-TreeRPO integrates RL into flow matching models for T2I generation.
It uses a sliding-window sampling strategy as a tree-structured search.
Dynamic noise intensities are applied along tree depth.
GRPO-guided optimization and constrained SDE sampling are used.
Prefix path sharing amortizes trajectory search overhead.
The method enhances exploration variation without extra computational cost.
It addresses the bottleneck of independent trajectory sampling.
The paper is available on arXiv under ID 2509.23352.

Entities

—

Sources

arXiv cs.AI — 2026-05-18