RAW-Dream: Task-Agnostic World Models for VLA Reinforcement Learning
A new arXiv preprint (2605.12334) introduces RAW-Dream (Reinforcing VLAs in task-Agnostic World Dreams), a paradigm for training Vision-Language-Action (VLA) models via reinforcement learning in world models. The method addresses the scalability issue of existing approaches that require task-specific data for fine-tuning world and reward models. RAW-Dream disentangles world model learning from downstream tasks by using a world model pre-trained on diverse task-free behaviors for trajectory prediction, and an off-the-shelf Vision-Language Model (VLM) for reward generation. This enables zero-shot inference on unseen tasks, reducing reliance on costly real-world interactions.
Key facts
- arXiv preprint 2605.12334 proposes RAW-Dream.
- RAW-Dream stands for Reinforcing VLAs in task-Agnostic World Dreams.
- It uses a world model pre-trained on task-free behaviors.
- Reward generation employs an off-the-shelf VLM.
- Aims to enable zero-shot inference on unseen tasks.
- Reduces sample complexity of policy training.
- Disentangles world model learning from downstream task dependencies.
- Addresses scalability limitations of existing VLA fine-tuning methods.
Entities
Institutions
- arXiv