Lifted World Models Enable High-Level Planning for Embodied Agents

ai-technology · 2026-04-30

The recently introduced framework in arXiv preprint 2604.26182 presents lifted world models that translate high-level actions into sequences of low-level joint movements, facilitating efficient planning for complex entities such as human agents. Conventional world models forecast future observations based on actions, but the high-dimensional nature of action spaces (like controlling each joint of a human) leads to poor scalability for search-based techniques like CEM. The innovative approach involves training a lightweight policy that integrates with a static world model, resulting in a lifted model capable of predicting future observations from a single high-level action. For human-like embodiments, the high-level action space consists of a limited set of 2D waypoints marked on the current observation frame, each indicating a near-term target position for a leaf joint (pelvis), thereby simplifying planning and enhancing control scalability.

Key facts

arXiv preprint 2604.26182 introduces lifted world models for planning and control.
World models predict future observations conditioned on agent actions.
High-dimensional action spaces (e.g., human joint control) make planning expensive.
A lightweight policy maps high-level actions to sequences of low-level joint actions.
The policy composes with a frozen world model to create a lifted world model.
High-level actions are defined as 2D waypoints on the current observation frame.
Each waypoint specifies a near-term goal position for a leaf joint (pelvis).
The framework aims to improve scalability of search-based planning methods like CEM.

Lifted World Models Enable High-Level Planning for Embodied Agents

Key facts

Entities

Institutions

Sources