ARTFEED — Contemporary Art Intelligence

QEVA: Reference-Free Metric for Video Summarization Evaluation

other · 2026-04-29

Researchers propose QEVA, a reference-free metric for evaluating video-to-text summarization by comparing summaries directly against source videos using multimodal question answering. QEVA assesses summaries across three dimensions: Coverage, Factuality, and Chronology. The team also introduces MLVU(VS)-Eval, a benchmark of 800 summaries generated from 200 videos using state-of-the-art video-language models. Experiments show QEVA achieves higher correlation with human judgments than existing metrics.

Key facts

  • QEVA is a reference-free evaluation metric for video summarization.
  • It uses multimodal question answering to compare summaries against source videos.
  • Three evaluation dimensions: Coverage, Factuality, and Chronology.
  • MLVU(VS)-Eval benchmark includes 800 summaries from 200 videos.
  • Summaries generated by state-of-the-art video-language multimodal models.
  • QEVA shows higher correlation with human judgments than existing metrics.
  • Paper published on arXiv (2604.24052).
  • Addresses limitations of n-gram overlap and LLM-based metrics.

Entities

Institutions

  • arXiv

Sources