ARTFEED — Contemporary Art Intelligence

UniEditBench Introduces Unified Benchmark for Image and Video Editing Evaluation

ai-technology · 2026-04-20

A new benchmark called UniEditBench addresses fragmented evaluation methods for visual editing models. Existing benchmarks are often tailored to specific paradigms, complicating fair cross-paradigm comparisons. Video editing currently lacks reliable evaluation benchmarks. Common automatic metrics frequently misalign with human preferences. Deploying large multimodal models as evaluators involves prohibitive computational and financial costs. UniEditBench supports both reconstruction-based and instruction-driven methods under a shared protocol. It includes a structured taxonomy covering nine image operations and eight video operations. The benchmark handles challenging compositional tasks like counting and spatial reordering. To enable scalable evaluation, the approach distills a high-capacity MLLM. The work is documented in arXiv preprint 2604.15871v1.

Key facts

  • UniEditBench is a unified benchmark for image and video editing evaluation
  • Existing benchmarks are fragmented and tailored to specific paradigms
  • Video editing lacks reliable evaluation benchmarks
  • Common automatic metrics often misalign with human preference
  • Using large multimodal models as evaluators incurs high computational and financial costs
  • UniEditBench supports reconstruction-based and instruction-driven methods
  • Includes taxonomy of nine image operations and eight video operations
  • Covers challenging compositional tasks like counting and spatial reordering

Entities

Sources