DiNa-LRM: Diffusion-Native Latent Reward Model for Preference Optimization

ai-technology · 2026-05-25

Researchers propose DiNa-LRM, a diffusion-native latent reward model that directly formulates preference learning on noisy diffusion states, avoiding the domain mismatch of pixel-space rewards from Vision-Language Models (VLMs). The method uses a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty, leveraging a pretrained latent diffusion backbone with a timestep-conditioned reward head. It supports inference-time noise ensembling for test-time scaling. This approach addresses the computational cost and domain mismatch issues of VLM-based rewards in optimizing diffusion and flow-matching models.

Key facts

DiNa-LRM is a diffusion-native latent reward model.
It formulates preference learning directly on noisy diffusion states.
Uses a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty.
Leverages a pretrained latent diffusion backbone with a timestep-conditioned reward head.
Supports inference-time noise ensembling.
Avoids domain mismatch of pixel-space rewards from VLMs.
Reduces computation and memory cost compared to VLM-based rewards.
Published on arXiv with ID 2602.11146.

DiNa-LRM: Diffusion-Native Latent Reward Model for Preference Optimization

Key facts

Entities

Institutions

Sources