Reinforcement Learning Boosts VLM Decision-Making in Games
A recent preprint on arXiv presents Odysseus, a technique designed to enhance vision-language models (VLMs) for managing over 100 decision-making turns in video games through reinforcement learning (RL). The study centers on Super Mario Land, a visually rich setting that demands synchronized perception, reasoning, and action over extended periods. The researchers meticulously analyze essential algorithmic elements and introduce a modified version of PPO featuring a streamlined turn-level critic, which boosts training stability and sample efficiency compared to critic-free approaches like GRPO and Reinforce++. Their findings indicate that pretrained VLMs possess robust initial capabilities, and RL training significantly improves performance in long-horizon scenarios. This research tackles the shortcomings of current methods that either depend on extensive supervised fine-tuning with human trajectories or limit RL to short-horizon contexts (approximately 20-30 turns). The results imply that RL-based training can successfully adapt VLMs for interactive decision-making in intricate, multi-step environments.
Key facts
- Odysseus scales VLMs to 100+ turn decision-making in games via RL.
- Research focuses on Super Mario Land as a test environment.
- Proposes adapted PPO with a lightweight turn-level critic.
- Improves stability and efficiency over GRPO and Reinforce++.
- Pretrained VLMs provide strong initial capabilities.
- Existing methods rely on SFT or short-horizon RL (20-30 turns).
- RL training enhances long-horizon performance.
- Study published on arXiv (2605.00347).
Entities
Institutions
- arXiv