ARTFEED — Contemporary Art Intelligence

InteractWeb-Bench: Benchmarking Multimodal Agents for Interactive Website Generation

ai-technology · 2026-05-01

A team of researchers has unveiled InteractWeb-Bench, the inaugural multimodal interactive benchmark designed for website creation by non-expert users utilizing low-code tools. This benchmark tackles the issue of semantic misalignment that arises from unclear and subpar instructions provided by these users, a problem referred to as "blind execution." It incorporates four distinct user agents and variations in persona-driven instructions to mimic a range of user behaviors, such as ambiguity, redundancy, and contradictions. This research is available on arXiv under the ID 2604.27419.

Key facts

  • InteractWeb-Bench is the first multimodal interactive benchmark for website generation under non-expert low-code user conditions.
  • It addresses the failure mode called 'blind execution' caused by semantic misalignment.
  • The benchmark introduces four types of user agents and persona-driven instruction perturbations.
  • It simulates user behaviors including ambiguity, redundancy, and contradiction.
  • The research is published on arXiv with ID 2604.27419.
  • The work focuses on multimodal large language models (MLLMs) and coding agents.
  • Existing benchmarks rely on idealized assumptions with well-structured inputs and static execution settings.
  • Real-world development is constrained by ambiguous, low-quality instructions from non-expert users.

Entities

Institutions

  • arXiv

Sources