ARTFEED — Contemporary Art Intelligence

MGSM-Pro: New Benchmark for Multilingual Math Reasoning in LLMs

other · 2026-04-30

Researchers introduced MGSM-Pro, a multilingual mathematical reasoning benchmark extending the MGSM dataset with GSM-Symbolic's instantiation approach. The dataset provides five variations per question by altering names, digits, and irrelevant context. Evaluations across nine languages reveal significant performance drops for low-resource languages on digit variations. Model robustness in high-resource languages does not transfer to low-resource ones. Proprietary models like Gemini 2.5 Flash and GPT-4.1 were tested.

Key facts

  • MGSM-Pro extends MGSM with GSM-Symbolic approach
  • Five instantiations per question by varying names, digits, and irrelevant context
  • Evaluated across nine languages
  • Low-resource languages suffer large performance drops on digit variations
  • Robustness in high-resource languages does not transfer to low-resource languages
  • Proprietary models tested include Gemini 2.5 Flash and GPT-4.1
  • Published on arXiv with ID 2601.21225
  • Announce type: replace-cross

Entities

Institutions

  • arXiv

Sources