VideoGameBench: Testing VLMs on 1990s Video Games

ai-technology · 2026-05-18

Researchers have introduced VideoGameBench, a benchmark consisting of ten popular video games from the 1990s designed to evaluate vision-language models (VLMs) on tasks like perception, spatial navigation, and memory management. Unlike existing benchmarks that rely on coding or math problems, VideoGameBench requires models to complete entire games using only raw visual inputs and high-level descriptions of objectives and controls. Three of the games are kept secret to encourage generalization. The work is detailed in arXiv paper 2505.18134.

Key facts

VideoGameBench includes 10 popular video games from the 1990s.
VLMs interact with games in real-time using only raw visual inputs.
Three games are kept secret to promote generalization.
The benchmark tests perception, spatial navigation, and memory management.
The paper is available on arXiv with ID 2505.18134.

Entities

—

Sources

arXiv cs.AI — 2026-05-18