iWorld-Bench: New Benchmark for Interactive World Models

ai-technology · 2026-05-07

A new benchmark called iWorld-Bench has been launched by researchers to facilitate the training and evaluation of world models, focusing on skills related to interaction, including distance perception and memory. This benchmark features a varied dataset comprising 330,000 video clips, from which 2,100 high-quality samples were carefully chosen, representing different scenes, weather conditions, and viewpoints. To standardize assessments across various interaction modalities, an Action Generation Framework was created, resulting in 4,900 test samples across six distinct task types that evaluate visual generation, trajectory following, and memory collectively. This initiative aims to fill the gap in large-scale datasets and standardized benchmarks for assessing physical interaction skills in artificial general intelligence research.

Key facts

iWorld-Bench is a benchmark for interactive world models.
Dataset includes 330,000 video clips.
2,100 high-quality samples selected.
Samples cover varied perspectives, weather, and scenes.
Action Generation Framework unifies evaluation.
Six task types with 4,900 test samples.
Tasks assess visual generation, trajectory following, and memory.
Addresses lack of large-scale datasets and unified benchmarks.

Entities

—

Sources

arXiv cs.AI — 2026-05-06