KL-Constrained Adversarial Curriculum Improves World Model Learning
Researchers propose PROWL, a method to improve world model learning by actively eliciting failures. A policy is trained to find high-error trajectories for a diffusion-based world model, which is then fine-tuned on these trajectories. This adversarial loop converts rare failures into stable training signals without drifting out of distribution. The approach addresses the issue of passive data under-sampling critical transitions.
Key facts
- Modern video world models achieve short-horizon realism but fail on rare transitions.
- Passive data under-samples high-impact regimes.
- PROWL uses a KL-constrained adversarial curriculum.
- A policy exposes high-error trajectories of a diffusion-based world model.
- The world model is fine-tuned on adversarially discovered trajectories.
- The method avoids out-of-distribution exploitation.
- It converts rare failures into near-distribution training signals.
- The approach maintains pressure on unresolved weaknesses.
Entities
Institutions
- arXiv