MPR-GUI: Benchmarking Multilingual GUI Agent Perception and Reasoning
A new benchmark called MPR-GUI-Bench has been developed by researchers to assess the multilingual perception and reasoning (P&R) abilities of large vision-language models (LVLMs) functioning as graphical user interface (GUI) agents. This benchmark tackles two significant drawbacks found in current GUI evaluations: the lack of detailed diagnostics for pinpointing specific P&R failures and the absence of well-aligned cross-lingual assessment environments. MPR-GUI-Bench includes aligned settings in six languages and eight specific P&R tasks. Initial findings indicate notable performance disparities between English and non-English contexts, especially in tasks requiring intensive reasoning. This research is available on arXiv with the identifier 2512.00756.
Key facts
- MPR-GUI-Bench is a multilingual benchmark for GUI agents.
- It evaluates perception and reasoning (P&R) capabilities of LVLMs.
- The benchmark covers six languages and eight fine-grained P&R tasks.
- Existing benchmarks lack fine-grained diagnostics for P&R failures.
- Existing benchmarks lack strictly aligned cross-lingual evaluation environments.
- MPR-GUI-Bench provides strictly aligned environments across languages.
- Results show consistent P&R gaps between English and non-English settings.
- The work is published on arXiv (ID: 2512.00756).
Entities
Institutions
- arXiv