ARTFEED — Contemporary Art Intelligence

VLATIM Benchmark Tests VLM Logical Reasoning in Puzzle Games

ai-technology · 2026-05-13

A new benchmark called Vision-Language Against The Incredible Machine (VLATIM) has been introduced to test how well Vision-Language(-Action) Models (VLMs) can tackle logical challenges in point-and-click puzzle games. This benchmark is based on the classic game The Incredible Machine 2 (TIM). VLATIM seeks to connect sophisticated logical reasoning with continuous action that requires precise mouse movements. It features five sequential sections that assess abilities from basic visual grounding to full puzzle completion. Recent findings show a significant difference between reasoning and implementation, with larger proprietary models demonstrating better planning skills.

Key facts

  • VLATIM benchmark introduced for evaluating VLMs in puzzle games
  • Based on The Incredible Machine 2 (TIM)
  • Targets gap between logical reasoning and continuous action spaces
  • Five progressive parts: visual grounding, domain understanding, multi-step manipulation, full puzzle solving
  • Large proprietary models show superior planning but execution gap remains

Entities

Institutions

  • arXiv

Sources