ARTFEED — Contemporary Art Intelligence

Reinforcement Learning Boosts VLM Decision-Making in Games

ai-technology · 2026-05-04

A recent preprint on arXiv presents Odysseus, a technique designed to enhance vision-language models (VLMs) for managing over 100 decision-making turns in video games through reinforcement learning (RL). The study centers on Super Mario Land, a visually rich setting that demands synchronized perception, reasoning, and action over extended periods. The researchers meticulously analyze essential algorithmic elements and introduce a modified version of PPO featuring a streamlined turn-level critic, which boosts training stability and sample efficiency compared to critic-free approaches like GRPO and Reinforce++. Their findings indicate that pretrained VLMs possess robust initial capabilities, and RL training significantly improves performance in long-horizon scenarios. This research tackles the shortcomings of current methods that either depend on extensive supervised fine-tuning with human trajectories or limit RL to short-horizon contexts (approximately 20-30 turns). The results imply that RL-based training can successfully adapt VLMs for interactive decision-making in intricate, multi-step environments.

Key facts

  • Odysseus scales VLMs to 100+ turn decision-making in games via RL.
  • Research focuses on Super Mario Land as a test environment.
  • Proposes adapted PPO with a lightweight turn-level critic.
  • Improves stability and efficiency over GRPO and Reinforce++.
  • Pretrained VLMs provide strong initial capabilities.
  • Existing methods rely on SFT or short-horizon RL (20-30 turns).
  • RL training enhances long-horizon performance.
  • Study published on arXiv (2605.00347).

Entities

Institutions

  • arXiv

Sources