InteractWeb-Bench: Benchmarking Multimodal Agents for Interactive Website Generation

ai-technology · 2026-05-01

A team of researchers has unveiled InteractWeb-Bench, the inaugural multimodal interactive benchmark designed for website creation by non-expert users utilizing low-code tools. This benchmark tackles the issue of semantic misalignment that arises from unclear and subpar instructions provided by these users, a problem referred to as "blind execution." It incorporates four distinct user agents and variations in persona-driven instructions to mimic a range of user behaviors, such as ambiguity, redundancy, and contradictions. This research is available on arXiv under the ID 2604.27419.

Key facts

InteractWeb-Bench is the first multimodal interactive benchmark for website generation under non-expert low-code user conditions.
It addresses the failure mode called 'blind execution' caused by semantic misalignment.
The benchmark introduces four types of user agents and persona-driven instruction perturbations.
It simulates user behaviors including ambiguity, redundancy, and contradiction.
The research is published on arXiv with ID 2604.27419.
The work focuses on multimodal large language models (MLLMs) and coding agents.
Existing benchmarks rely on idealized assumptions with well-structured inputs and static execution settings.
Real-world development is constrained by ambiguous, low-quality instructions from non-expert users.

InteractWeb-Bench: Benchmarking Multimodal Agents for Interactive Website Generation

Key facts

Entities

Institutions

Sources