ARTFEED — Contemporary Art Intelligence

LongAct Benchmark and HoloMind Agent for Long-Horizon Household Tasks

ai-technology · 2026-05-16

Introducing LongAct, a novel benchmark that assesses planning-level autonomy in long-term household activities defined by free-form instructions. This initiative fills a void in current embodied AI benchmarks, which primarily concentrate on short-term navigation and manipulation. LongAct simplifies the complexities of embodiment-specific low-level control, allowing for a focus on advanced cognitive skills like understanding instructions, managing dependencies, maintaining memory, and planning adaptively. In conjunction with LongAct, the researchers present HoloMind, an agent powered by VLM that incorporates a DAG-based hierarchical planner for long-horizon tasks, a Multimodal Spatial Memory for ongoing world modeling, an Episodic Memory for reusing experiences, and a global Critic for reflective oversight. Tests with GPT-5 and Qwen3-VL show that HoloMind significantly enhances performance in long-term tasks.

Key facts

  • LongAct is a benchmark for long-horizon household tasks with free-form instructions.
  • Existing embodied AI benchmarks emphasize short-horizon navigation or manipulation.
  • LongAct abstracts away embodiment-specific low-level control.
  • HoloMind is a VLM-driven agent with a DAG-based planner.
  • HoloMind includes Multimodal Spatial Memory, Episodic Memory, and a global Critic.
  • Experiments used GPT-5 and Qwen3-VL models.
  • HoloMind substantially improves long-horizon task performance.
  • The work is published on arXiv with ID 2605.14504.

Entities

Institutions

  • arXiv

Sources