RAW-Dream: Task-Agnostic World Models for VLA Reinforcement Learning

ai-technology · 2026-05-13

A new arXiv preprint (2605.12334) introduces RAW-Dream (Reinforcing VLAs in task-Agnostic World Dreams), a paradigm for training Vision-Language-Action (VLA) models via reinforcement learning in world models. The method addresses the scalability issue of existing approaches that require task-specific data for fine-tuning world and reward models. RAW-Dream disentangles world model learning from downstream tasks by using a world model pre-trained on diverse task-free behaviors for trajectory prediction, and an off-the-shelf Vision-Language Model (VLM) for reward generation. This enables zero-shot inference on unseen tasks, reducing reliance on costly real-world interactions.

Key facts

arXiv preprint 2605.12334 proposes RAW-Dream.
RAW-Dream stands for Reinforcing VLAs in task-Agnostic World Dreams.
It uses a world model pre-trained on task-free behaviors.
Reward generation employs an off-the-shelf VLM.
Aims to enable zero-shot inference on unseen tasks.
Reduces sample complexity of policy training.
Disentangles world model learning from downstream task dependencies.
Addresses scalability limitations of existing VLA fine-tuning methods.

RAW-Dream: Task-Agnostic World Models for VLA Reinforcement Learning

Key facts

Entities

Institutions

Sources