Nano World Models: Minimalist Codebase for Future Video Prediction
A newly launched minimalist codebase, named Nano World Models, focuses on future video prediction research. It emphasizes diffusion forcing and offers a cohesive interface for various generative objectives, model scales, action conditioning, latent observation spaces, datasets, evaluation methods, and long-term rollouts. The initiative seeks to facilitate controlled investigations of world-modeling elements that are frequently intertwined in different implementations. Experiments include simple control environments, game simulations, and real-robot data. This codebase is crafted to be compact, reproducible, and easily extendable, filling a gap in the research community despite the swift advancements in industry-level interactive video generation. This work is detailed in arXiv preprint 2605.23993.
Key facts
- Nano World Models is a minimalist codebase for future video prediction.
- It centers around diffusion forcing.
- Provides a unified interface for generative objectives, model scales, action conditioning, latent observation spaces, datasets, evaluation protocols, and long-horizon rollouts.
- Enables controlled studies of world-modeling components.
- Experiments conducted on simple control environments, game simulation, and real-robot data.
- Addresses lack of compact, reproducible, and extensible implementations in the research community.
- Published as arXiv preprint 2605.23993.
Entities
Institutions
- arXiv