EntityBench: Benchmark for Consistent Multi-Shot Video Generation

ai-technology · 2026-05-16

EntityBench has been launched by researchers as a benchmark aimed at assessing multi-shot video generation systems regarding their capacity to preserve consistent characters, objects, and settings over extended sequences. This benchmark features 140 episodes, which consist of 2,491 shots taken from actual narrative media, with detailed per-shot entity schedules categorized into easy, medium, and hard levels. Each episode can include as many as 50 shots, 13 characters, 8 locations, and 22 objects, with gaps of up to 48 shots for recurrence. Accompanying EntityBench is a comprehensive evaluation suite that analyzes intra-shot quality, alignment with prompts, and consistency across shots, including a fidelity gate for precise entity representation. This initiative tackles the issue of standardized comparisons in multi-shot video generation, where current evaluations often rely on limited entity coverage and basic consistency metrics.

Key facts

EntityBench is a benchmark for multi-shot video generation.
It includes 140 episodes and 2,491 shots.
Episodes are derived from real narrative media.
Per-shot entity schedules track characters, objects, and locations.
Three difficulty tiers: easy, medium, hard.
Up to 50 shots per episode.
Up to 13 cross-shot characters, 8 locations, 22 objects.
Recurrence gaps up to 48 shots.
Evaluation suite has three pillars: intra-shot quality, prompt-following alignment, cross-shot consistency.
Includes a fidelity gate for accurate entity representation.

Entities

—

Sources

arXiv cs.AI — 2026-05-16