HP-Edit Framework Uses Human Preference Data to Improve Diffusion-Based Image Editing

ai-technology · 2026-04-22

A new post-training framework called HP-Edit has been developed to align diffusion-based image editing with human preferences, addressing a gap in applying Reinforcement Learning from Human Feedback (RLHF) to editing tasks. The framework leverages a small dataset of human preference scores and a pretrained visual large language model to create an automated evaluator named HP-Scorer. This evaluator is designed to assess editing quality across eight common tasks, including common object editing. The research introduces RealPref-50K, a real-world dataset containing 50,000 examples that balance various editing needs. While methods like Diffusion-DPO and Flow-GRPO have previously enhanced generation quality through reinforcement learning, efficiently scaling RLHF for editing has remained challenging due to limited datasets and frameworks. HP-Edit aims to overcome these limitations by providing a tailored approach for diverse editing applications. The framework is detailed in the arXiv preprint 2604.19406v1, which was announced as a cross-disciplinary abstract. By utilizing human feedback, the system seeks to improve the quality and relevance of edits produced by generative diffusion models, which are widely used for real-world content editing. The development focuses on making preference alignment more scalable and effective for practical editing scenarios.

Key facts

HP-Edit is a post-training framework for human preference-aligned image editing.
It uses a small amount of human-preference scoring data to develop an automated evaluator called HP-Scorer.
The framework introduces RealPref-50K, a dataset with 50,000 examples across eight common editing tasks.
RealPref-50K balances common object editing and addresses diverse editing needs.
HP-Scorer is built using a pretrained visual large language model (VLM).
The research aims to efficiently apply Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing.
Previous methods like Diffusion-DPO and Flow-GRPO have improved generation quality through reinforcement learning.
The framework is detailed in the arXiv preprint 2604.19406v1, announced as a cross-disciplinary abstract.

Entities

—

Sources

arXiv cs.AI — 2026-04-22