InteractWeb-Bench: Benchmarking Multimodal Agents for Interactive Website Generation
A team of researchers has unveiled InteractWeb-Bench, the inaugural multimodal interactive benchmark designed for website creation by non-expert users utilizing low-code tools. This benchmark tackles the issue of semantic misalignment that arises from unclear and subpar instructions provided by these users, a problem referred to as "blind execution." It incorporates four distinct user agents and variations in persona-driven instructions to mimic a range of user behaviors, such as ambiguity, redundancy, and contradictions. This research is available on arXiv under the ID 2604.27419.
Key facts
- InteractWeb-Bench is the first multimodal interactive benchmark for website generation under non-expert low-code user conditions.
- It addresses the failure mode called 'blind execution' caused by semantic misalignment.
- The benchmark introduces four types of user agents and persona-driven instruction perturbations.
- It simulates user behaviors including ambiguity, redundancy, and contradiction.
- The research is published on arXiv with ID 2604.27419.
- The work focuses on multimodal large language models (MLLMs) and coding agents.
- Existing benchmarks rely on idealized assumptions with well-structured inputs and static execution settings.
- Real-world development is constrained by ambiguous, low-quality instructions from non-expert users.
Entities
Institutions
- arXiv