DIDR: A Principled RL Framework for One-Step Text-to-Image Generators

publication · 2026-05-26

A new framework called Diff-Instruct with Diffused Reward (DIDR) has been introduced by researchers for aligning one-step text-to-image generators without requiring data. Based on Integral KL minimization, DIDR disseminates the RLHF-optimal reward-tilted clean-image distribution throughout various noise levels along the diffusion path. This innovation tackles the issue of misalignment between terminal reward optimization and generative dynamics seen in earlier reinforcement learning approaches, which frequently prioritized reward enhancement over image quality. The details of this framework can be found in the arXiv paper numbered 2605.24001.

Key facts

DIDR is a data-free trajectory-level alignment framework for one-step text-to-image generators.
It is derived from Integral KL minimization.
DIDR propagates the RLHF-optimal reward-tilted clean-image distribution across all noise levels.
It addresses the mismatch between terminal reward optimization and generative dynamics.
Previous RL methods for one-step generators combined image-space reward optimization with diffusion noisy-space distribution matching.
The optimization in previous methods tended to exploit stochastic degrees of freedom, improving reward at the expense of image fidelity.
The paper is available on arXiv with ID 2605.24001.
The framework aims to achieve principled one-step generator RL.

DIDR: A Principled RL Framework for One-Step Text-to-Image Generators

Key facts

Entities

Institutions

Sources