Reinforcement Learning Boosts VLM Decision-Making in Games

ai-technology · 2026-05-04

A recent preprint on arXiv presents Odysseus, a technique designed to enhance vision-language models (VLMs) for managing over 100 decision-making turns in video games through reinforcement learning (RL). The study centers on Super Mario Land, a visually rich setting that demands synchronized perception, reasoning, and action over extended periods. The researchers meticulously analyze essential algorithmic elements and introduce a modified version of PPO featuring a streamlined turn-level critic, which boosts training stability and sample efficiency compared to critic-free approaches like GRPO and Reinforce++. Their findings indicate that pretrained VLMs possess robust initial capabilities, and RL training significantly improves performance in long-horizon scenarios. This research tackles the shortcomings of current methods that either depend on extensive supervised fine-tuning with human trajectories or limit RL to short-horizon contexts (approximately 20-30 turns). The results imply that RL-based training can successfully adapt VLMs for interactive decision-making in intricate, multi-step environments.

Key facts

Odysseus scales VLMs to 100+ turn decision-making in games via RL.
Research focuses on Super Mario Land as a test environment.
Proposes adapted PPO with a lightweight turn-level critic.
Improves stability and efficiency over GRPO and Reinforce++.
Pretrained VLMs provide strong initial capabilities.
Existing methods rely on SFT or short-horizon RL (20-30 turns).
RL training enhances long-horizon performance.
Study published on arXiv (2605.00347).

Reinforcement Learning Boosts VLM Decision-Making in Games

Key facts

Entities

Institutions

Sources