Lifted World Models Enable High-Level Planning for Embodied Agents
The recently introduced framework in arXiv preprint 2604.26182 presents lifted world models that translate high-level actions into sequences of low-level joint movements, facilitating efficient planning for complex entities such as human agents. Conventional world models forecast future observations based on actions, but the high-dimensional nature of action spaces (like controlling each joint of a human) leads to poor scalability for search-based techniques like CEM. The innovative approach involves training a lightweight policy that integrates with a static world model, resulting in a lifted model capable of predicting future observations from a single high-level action. For human-like embodiments, the high-level action space consists of a limited set of 2D waypoints marked on the current observation frame, each indicating a near-term target position for a leaf joint (pelvis), thereby simplifying planning and enhancing control scalability.
Key facts
- arXiv preprint 2604.26182 introduces lifted world models for planning and control.
- World models predict future observations conditioned on agent actions.
- High-dimensional action spaces (e.g., human joint control) make planning expensive.
- A lightweight policy maps high-level actions to sequences of low-level joint actions.
- The policy composes with a frozen world model to create a lifted world model.
- High-level actions are defined as 2D waypoints on the current observation frame.
- Each waypoint specifies a near-term goal position for a leaf joint (pelvis).
- The framework aims to improve scalability of search-based planning methods like CEM.
Entities
Institutions
- arXiv