LongAct Benchmark and HoloMind Agent for Long-Horizon Household Tasks

ai-technology · 2026-05-16

Introducing LongAct, a novel benchmark that assesses planning-level autonomy in long-term household activities defined by free-form instructions. This initiative fills a void in current embodied AI benchmarks, which primarily concentrate on short-term navigation and manipulation. LongAct simplifies the complexities of embodiment-specific low-level control, allowing for a focus on advanced cognitive skills like understanding instructions, managing dependencies, maintaining memory, and planning adaptively. In conjunction with LongAct, the researchers present HoloMind, an agent powered by VLM that incorporates a DAG-based hierarchical planner for long-horizon tasks, a Multimodal Spatial Memory for ongoing world modeling, an Episodic Memory for reusing experiences, and a global Critic for reflective oversight. Tests with GPT-5 and Qwen3-VL show that HoloMind significantly enhances performance in long-term tasks.

Key facts

LongAct is a benchmark for long-horizon household tasks with free-form instructions.
Existing embodied AI benchmarks emphasize short-horizon navigation or manipulation.
LongAct abstracts away embodiment-specific low-level control.
HoloMind is a VLM-driven agent with a DAG-based planner.
HoloMind includes Multimodal Spatial Memory, Episodic Memory, and a global Critic.
Experiments used GPT-5 and Qwen3-VL models.
HoloMind substantially improves long-horizon task performance.
The work is published on arXiv with ID 2605.14504.

LongAct Benchmark and HoloMind Agent for Long-Horizon Household Tasks

Key facts

Entities

Institutions

Sources