MPR-GUI: Benchmarking Multilingual GUI Agent Perception and Reasoning

ai-technology · 2026-04-30

A new benchmark called MPR-GUI-Bench has been developed by researchers to assess the multilingual perception and reasoning (P&R) abilities of large vision-language models (LVLMs) functioning as graphical user interface (GUI) agents. This benchmark tackles two significant drawbacks found in current GUI evaluations: the lack of detailed diagnostics for pinpointing specific P&R failures and the absence of well-aligned cross-lingual assessment environments. MPR-GUI-Bench includes aligned settings in six languages and eight specific P&R tasks. Initial findings indicate notable performance disparities between English and non-English contexts, especially in tasks requiring intensive reasoning. This research is available on arXiv with the identifier 2512.00756.

Key facts

MPR-GUI-Bench is a multilingual benchmark for GUI agents.
It evaluates perception and reasoning (P&R) capabilities of LVLMs.
The benchmark covers six languages and eight fine-grained P&R tasks.
Existing benchmarks lack fine-grained diagnostics for P&R failures.
Existing benchmarks lack strictly aligned cross-lingual evaluation environments.
MPR-GUI-Bench provides strictly aligned environments across languages.
Results show consistent P&R gaps between English and non-English settings.
The work is published on arXiv (ID: 2512.00756).

MPR-GUI: Benchmarking Multilingual GUI Agent Perception and Reasoning

Key facts

Entities

Institutions

Sources