HP-Edit Framework Uses Human Preference Data to Improve Diffusion-Based Image Editing
A new post-training framework called HP-Edit has been developed to align diffusion-based image editing with human preferences, addressing a gap in applying Reinforcement Learning from Human Feedback (RLHF) to editing tasks. The framework leverages a small dataset of human preference scores and a pretrained visual large language model to create an automated evaluator named HP-Scorer. This evaluator is designed to assess editing quality across eight common tasks, including common object editing. The research introduces RealPref-50K, a real-world dataset containing 50,000 examples that balance various editing needs. While methods like Diffusion-DPO and Flow-GRPO have previously enhanced generation quality through reinforcement learning, efficiently scaling RLHF for editing has remained challenging due to limited datasets and frameworks. HP-Edit aims to overcome these limitations by providing a tailored approach for diverse editing applications. The framework is detailed in the arXiv preprint 2604.19406v1, which was announced as a cross-disciplinary abstract. By utilizing human feedback, the system seeks to improve the quality and relevance of edits produced by generative diffusion models, which are widely used for real-world content editing. The development focuses on making preference alignment more scalable and effective for practical editing scenarios.
Key facts
- HP-Edit is a post-training framework for human preference-aligned image editing.
- It uses a small amount of human-preference scoring data to develop an automated evaluator called HP-Scorer.
- The framework introduces RealPref-50K, a dataset with 50,000 examples across eight common editing tasks.
- RealPref-50K balances common object editing and addresses diverse editing needs.
- HP-Scorer is built using a pretrained visual large language model (VLM).
- The research aims to efficiently apply Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing.
- Previous methods like Diffusion-DPO and Flow-GRPO have improved generation quality through reinforcement learning.
- The framework is detailed in the arXiv preprint 2604.19406v1, announced as a cross-disciplinary abstract.
Entities
—