New Benchmark and Reward Models for Video Understanding

ai-technology · 2026-05-11

A team of researchers has introduced the Video Understanding Reward Bench (VURB), featuring 2,100 preference pairs and extensive reasoning traces that average 1,143 tokens, aimed at assessing video understanding reward models. Additionally, they have developed the Video Understanding Preference Dataset (VUP-35K) through a fully automated process. Two reward models, namely VideoDRM (discriminative) and VideoGRM (generative), have been trained and demonstrate leading performance on VURB as well as other video-related tasks.

Key facts

VURB benchmark features 2,100 preference pairs
Chain-of-thought reasoning traces average 1,143 tokens
VUP-35K dataset constructed via automated pipeline
VideoDRM is a discriminative reward model
VideoGRM is a generative reward model
Both models achieve state-of-the-art performance
Benchmark covers general, long, and reasoning-oriented video tasks
Majority voting evaluation is used

Entities

—

Sources

arXiv cs.AI — 2026-05-11