QEVA: Reference-Free Metric for Video Summarization Evaluation

other · 2026-04-29

Researchers propose QEVA, a reference-free metric for evaluating video-to-text summarization by comparing summaries directly against source videos using multimodal question answering. QEVA assesses summaries across three dimensions: Coverage, Factuality, and Chronology. The team also introduces MLVU(VS)-Eval, a benchmark of 800 summaries generated from 200 videos using state-of-the-art video-language models. Experiments show QEVA achieves higher correlation with human judgments than existing metrics.

Key facts

QEVA is a reference-free evaluation metric for video summarization.
It uses multimodal question answering to compare summaries against source videos.
Three evaluation dimensions: Coverage, Factuality, and Chronology.
MLVU(VS)-Eval benchmark includes 800 summaries from 200 videos.
Summaries generated by state-of-the-art video-language multimodal models.
QEVA shows higher correlation with human judgments than existing metrics.
Paper published on arXiv (2604.24052).
Addresses limitations of n-gram overlap and LLM-based metrics.

QEVA: Reference-Free Metric for Video Summarization Evaluation

Key facts

Entities

Institutions

Sources