LongAct Benchmark and HoloMind Agent for Long-Horizon Household Tasks
Introducing LongAct, a novel benchmark that assesses planning-level autonomy in long-term household activities defined by free-form instructions. This initiative fills a void in current embodied AI benchmarks, which primarily concentrate on short-term navigation and manipulation. LongAct simplifies the complexities of embodiment-specific low-level control, allowing for a focus on advanced cognitive skills like understanding instructions, managing dependencies, maintaining memory, and planning adaptively. In conjunction with LongAct, the researchers present HoloMind, an agent powered by VLM that incorporates a DAG-based hierarchical planner for long-horizon tasks, a Multimodal Spatial Memory for ongoing world modeling, an Episodic Memory for reusing experiences, and a global Critic for reflective oversight. Tests with GPT-5 and Qwen3-VL show that HoloMind significantly enhances performance in long-term tasks.
Key facts
- LongAct is a benchmark for long-horizon household tasks with free-form instructions.
- Existing embodied AI benchmarks emphasize short-horizon navigation or manipulation.
- LongAct abstracts away embodiment-specific low-level control.
- HoloMind is a VLM-driven agent with a DAG-based planner.
- HoloMind includes Multimodal Spatial Memory, Episodic Memory, and a global Critic.
- Experiments used GPT-5 and Qwen3-VL models.
- HoloMind substantially improves long-horizon task performance.
- The work is published on arXiv with ID 2605.14504.
Entities
Institutions
- arXiv