QEVA: Reference-Free Metric for Video Summarization Evaluation
Researchers propose QEVA, a reference-free metric for evaluating video-to-text summarization by comparing summaries directly against source videos using multimodal question answering. QEVA assesses summaries across three dimensions: Coverage, Factuality, and Chronology. The team also introduces MLVU(VS)-Eval, a benchmark of 800 summaries generated from 200 videos using state-of-the-art video-language models. Experiments show QEVA achieves higher correlation with human judgments than existing metrics.
Key facts
- QEVA is a reference-free evaluation metric for video summarization.
- It uses multimodal question answering to compare summaries against source videos.
- Three evaluation dimensions: Coverage, Factuality, and Chronology.
- MLVU(VS)-Eval benchmark includes 800 summaries from 200 videos.
- Summaries generated by state-of-the-art video-language multimodal models.
- QEVA shows higher correlation with human judgments than existing metrics.
- Paper published on arXiv (2604.24052).
- Addresses limitations of n-gram overlap and LLM-based metrics.
Entities
Institutions
- arXiv