MathNet Dataset Launches as Global Benchmark for Mathematical Reasoning in AI Models

ai-technology · 2026-04-22

A newly launched multimodal benchmark, MathNet, aims to assess mathematical reasoning in generative models and retrieval systems based on embeddings. This dataset comprises 30,676 problems authored by experts, featuring solutions from 47 countries and available in 17 languages. The problems originate from two decades of Olympiad-level math competitions, encompassing a variety of mathematical fields. MathNet facilitates three distinct tasks: Problem Solving, Math-Aware Retrieval, and Retrieval-Augmented Problem Solving. Additionally, a retrieval benchmark has been created, consisting of pairs of mathematically equivalent and structurally similar problems curated by experts. Previous benchmarks have been constrained in terms of size, language, and task variety, presenting significant challenges for large language and multimodal models. Experimental findings reveal that even the most advanced models face difficulties with these tasks.

Key facts

MathNet is a multimodal benchmark for mathematical reasoning
Contains 30,676 expert-authored problems with solutions
Spans 47 countries and 17 languages
Covers two decades of Olympiad-level math competitions
Supports three tasks: Problem Solving, Math-Aware Retrieval, and Retrieval-Augmented Problem Solving
Includes retrieval benchmark with mathematically equivalent problem pairs
Mathematical problem solving remains challenging for large language models
Existing benchmarks are limited in size, language coverage, and task diversity

Entities

—

Sources

arXiv cs.AI — 2026-04-21