ARTFEED — Contemporary Art Intelligence

DiNa-LRM: Diffusion-Native Latent Reward Model for Preference Optimization

ai-technology · 2026-05-25

Researchers propose DiNa-LRM, a diffusion-native latent reward model that directly formulates preference learning on noisy diffusion states, avoiding the domain mismatch of pixel-space rewards from Vision-Language Models (VLMs). The method uses a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty, leveraging a pretrained latent diffusion backbone with a timestep-conditioned reward head. It supports inference-time noise ensembling for test-time scaling. This approach addresses the computational cost and domain mismatch issues of VLM-based rewards in optimizing diffusion and flow-matching models.

Key facts

  • DiNa-LRM is a diffusion-native latent reward model.
  • It formulates preference learning directly on noisy diffusion states.
  • Uses a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty.
  • Leverages a pretrained latent diffusion backbone with a timestep-conditioned reward head.
  • Supports inference-time noise ensembling.
  • Avoids domain mismatch of pixel-space rewards from VLMs.
  • Reduces computation and memory cost compared to VLM-based rewards.
  • Published on arXiv with ID 2602.11146.

Entities

Institutions

  • arXiv

Sources