VLATIM Benchmark Tests VLM Logical Reasoning in Puzzle Games

ai-technology · 2026-05-13

A new benchmark called Vision-Language Against The Incredible Machine (VLATIM) has been introduced to test how well Vision-Language(-Action) Models (VLMs) can tackle logical challenges in point-and-click puzzle games. This benchmark is based on the classic game The Incredible Machine 2 (TIM). VLATIM seeks to connect sophisticated logical reasoning with continuous action that requires precise mouse movements. It features five sequential sections that assess abilities from basic visual grounding to full puzzle completion. Recent findings show a significant difference between reasoning and implementation, with larger proprietary models demonstrating better planning skills.

Key facts

VLATIM benchmark introduced for evaluating VLMs in puzzle games
Based on The Incredible Machine 2 (TIM)
Targets gap between logical reasoning and continuous action spaces
Five progressive parts: visual grounding, domain understanding, multi-step manipulation, full puzzle solving
Large proprietary models show superior planning but execution gap remains

VLATIM Benchmark Tests VLM Logical Reasoning in Puzzle Games

Key facts

Entities

Institutions

Sources