ARTFEED — Contemporary Art Intelligence

New Benchmark and Reward Models for Video Understanding

ai-technology · 2026-05-11

A team of researchers has introduced the Video Understanding Reward Bench (VURB), featuring 2,100 preference pairs and extensive reasoning traces that average 1,143 tokens, aimed at assessing video understanding reward models. Additionally, they have developed the Video Understanding Preference Dataset (VUP-35K) through a fully automated process. Two reward models, namely VideoDRM (discriminative) and VideoGRM (generative), have been trained and demonstrate leading performance on VURB as well as other video-related tasks.

Key facts

  • VURB benchmark features 2,100 preference pairs
  • Chain-of-thought reasoning traces average 1,143 tokens
  • VUP-35K dataset constructed via automated pipeline
  • VideoDRM is a discriminative reward model
  • VideoGRM is a generative reward model
  • Both models achieve state-of-the-art performance
  • Benchmark covers general, long, and reasoning-oriented video tasks
  • Majority voting evaluation is used

Entities

Sources