UniEditBench Introduces Unified Benchmark for Image and Video Editing Evaluation

ai-technology · 2026-04-20

A new benchmark called UniEditBench addresses fragmented evaluation methods for visual editing models. Existing benchmarks are often tailored to specific paradigms, complicating fair cross-paradigm comparisons. Video editing currently lacks reliable evaluation benchmarks. Common automatic metrics frequently misalign with human preferences. Deploying large multimodal models as evaluators involves prohibitive computational and financial costs. UniEditBench supports both reconstruction-based and instruction-driven methods under a shared protocol. It includes a structured taxonomy covering nine image operations and eight video operations. The benchmark handles challenging compositional tasks like counting and spatial reordering. To enable scalable evaluation, the approach distills a high-capacity MLLM. The work is documented in arXiv preprint 2604.15871v1.

Key facts

UniEditBench is a unified benchmark for image and video editing evaluation
Existing benchmarks are fragmented and tailored to specific paradigms
Video editing lacks reliable evaluation benchmarks
Common automatic metrics often misalign with human preference
Using large multimodal models as evaluators incurs high computational and financial costs
UniEditBench supports reconstruction-based and instruction-driven methods
Includes taxonomy of nine image operations and eight video operations
Covers challenging compositional tasks like counting and spatial reordering

Entities

—

Sources

arXiv cs.AI — 2026-04-20