ARTFEED — Contemporary Art Intelligence

MathNet Dataset Launches as Global Benchmark for Mathematical Reasoning in AI Models

ai-technology · 2026-04-22

A newly launched multimodal benchmark, MathNet, aims to assess mathematical reasoning in generative models and retrieval systems based on embeddings. This dataset comprises 30,676 problems authored by experts, featuring solutions from 47 countries and available in 17 languages. The problems originate from two decades of Olympiad-level math competitions, encompassing a variety of mathematical fields. MathNet facilitates three distinct tasks: Problem Solving, Math-Aware Retrieval, and Retrieval-Augmented Problem Solving. Additionally, a retrieval benchmark has been created, consisting of pairs of mathematically equivalent and structurally similar problems curated by experts. Previous benchmarks have been constrained in terms of size, language, and task variety, presenting significant challenges for large language and multimodal models. Experimental findings reveal that even the most advanced models face difficulties with these tasks.

Key facts

  • MathNet is a multimodal benchmark for mathematical reasoning
  • Contains 30,676 expert-authored problems with solutions
  • Spans 47 countries and 17 languages
  • Covers two decades of Olympiad-level math competitions
  • Supports three tasks: Problem Solving, Math-Aware Retrieval, and Retrieval-Augmented Problem Solving
  • Includes retrieval benchmark with mathematically equivalent problem pairs
  • Mathematical problem solving remains challenging for large language models
  • Existing benchmarks are limited in size, language coverage, and task diversity

Entities

Sources