Nano World Models: Minimalist Codebase for Future Video Prediction

publication · 2026-05-26

A newly launched minimalist codebase, named Nano World Models, focuses on future video prediction research. It emphasizes diffusion forcing and offers a cohesive interface for various generative objectives, model scales, action conditioning, latent observation spaces, datasets, evaluation methods, and long-term rollouts. The initiative seeks to facilitate controlled investigations of world-modeling elements that are frequently intertwined in different implementations. Experiments include simple control environments, game simulations, and real-robot data. This codebase is crafted to be compact, reproducible, and easily extendable, filling a gap in the research community despite the swift advancements in industry-level interactive video generation. This work is detailed in arXiv preprint 2605.23993.

Key facts

Nano World Models is a minimalist codebase for future video prediction.
It centers around diffusion forcing.
Provides a unified interface for generative objectives, model scales, action conditioning, latent observation spaces, datasets, evaluation protocols, and long-horizon rollouts.
Enables controlled studies of world-modeling components.
Experiments conducted on simple control environments, game simulation, and real-robot data.
Addresses lack of compact, reproducible, and extensible implementations in the research community.
Published as arXiv preprint 2605.23993.

Nano World Models: Minimalist Codebase for Future Video Prediction

Key facts

Entities

Institutions

Sources