MGSM-Pro: New Benchmark for Multilingual Math Reasoning in LLMs

other · 2026-04-30

Researchers introduced MGSM-Pro, a multilingual mathematical reasoning benchmark extending the MGSM dataset with GSM-Symbolic's instantiation approach. The dataset provides five variations per question by altering names, digits, and irrelevant context. Evaluations across nine languages reveal significant performance drops for low-resource languages on digit variations. Model robustness in high-resource languages does not transfer to low-resource ones. Proprietary models like Gemini 2.5 Flash and GPT-4.1 were tested.

Key facts

MGSM-Pro extends MGSM with GSM-Symbolic approach
Five instantiations per question by varying names, digits, and irrelevant context
Evaluated across nine languages
Low-resource languages suffer large performance drops on digit variations
Robustness in high-resource languages does not transfer to low-resource languages
Proprietary models tested include Gemini 2.5 Flash and GPT-4.1
Published on arXiv with ID 2601.21225
Announce type: replace-cross

MGSM-Pro: New Benchmark for Multilingual Math Reasoning in LLMs

Key facts

Entities

Institutions

Sources