KL-Constrained Adversarial Curriculum Improves World Model Learning

ai-technology · 2026-05-20

Researchers propose PROWL, a method to improve world model learning by actively eliciting failures. A policy is trained to find high-error trajectories for a diffusion-based world model, which is then fine-tuned on these trajectories. This adversarial loop converts rare failures into stable training signals without drifting out of distribution. The approach addresses the issue of passive data under-sampling critical transitions.

Key facts

Modern video world models achieve short-horizon realism but fail on rare transitions.
Passive data under-samples high-impact regimes.
PROWL uses a KL-constrained adversarial curriculum.
A policy exposes high-error trajectories of a diffusion-based world model.
The world model is fine-tuned on adversarially discovered trajectories.
The method avoids out-of-distribution exploitation.
It converts rare failures into near-distribution training signals.
The approach maintains pressure on unresolved weaknesses.

KL-Constrained Adversarial Curriculum Improves World Model Learning

Key facts

Entities

Institutions

Sources