ARTFEED — Contemporary Art Intelligence

Semi-DPO: Semi-Supervised Learning for Noisy Preferences in Diffusion DPO

ai-technology · 2026-04-30

A recent study published on arXiv (2604.24952) presents Semi-DPO, a semi-supervised learning method designed to tackle label noise in Diffusion Direct Preference Optimization (DPO). While human visual preferences are inherently multi-dimensional, current datasets reduce them to simple binary labels (winner/loser), leading to conflicting gradient signals. Semi-DPO identifies consistent preference pairs as clean labeled data and treats conflicting pairs as noisy unlabeled data. Initially, it trains on a clean subset filtered for consensus, and subsequently employs the model as an implicit classifier to create pseudo-labels for the noisy data, allowing for iterative refinement. This approach achieves cutting-edge performance.

Key facts

  • arXiv paper 2604.24952
  • Semi-DPO addresses label noise in Diffusion DPO
  • Human visual preferences are multi-dimensional
  • Existing datasets use single binary labels
  • Conflicting gradient signals misguide DPO
  • Semi-DPO uses semi-supervised learning
  • Consistent pairs are clean labeled data
  • Conflicting pairs are noisy unlabeled data
  • Consensus-filtered clean subset for initial training
  • Implicit classifier generates pseudo-labels
  • Iterative refinement improves performance
  • State-of-the-art results reported

Entities

Institutions

  • arXiv

Sources