MEDS Dataset Maps LLM Math Reasoning Across 28,000 Personas

ai-technology · 2026-05-01

Researchers have introduced MEDS (Math Education Digital Shadows), a dataset designed to evaluate how large language models reason about mathematics under human-like and AI-like conditions. The dataset comprises 28,000 personas derived from 14 LLMs, including models from Mistral, Qwen, DeepSeek, Granite, Phi, and Grok families. Each persona is prompted with four types of math tasks: an open math interview, three psychometric tests on math perceptions with explanations, cognitive networks capturing math attitudes, and 18 high-school math test questions with reasoning and confidence scores. Unlike traditional score-only benchmarks, MEDS integrates concepts of self-efficacy and math anxiety to provide a richer picture of LLM mathematical capabilities and biases.

Key facts

MEDS stands for Math Education Digital Shadows.
The dataset includes 28,000 personas from 14 LLMs.
LLM families used: Mistral, Qwen, DeepSeek, Granite, Phi, Grok.
Personas shadow either humans or AI assistants.
Tasks include open math interview, psychometric tests, cognitive networks, and 18 high-school math questions.
Psychometric tests assess math perceptions with explanations.
Cognitive networks capture math attitudes.
MEDS goes beyond score-only benchmarks by including self-efficacy and math anxiety.
The dataset aims to enhance LLM impact on math education.

Entities

—

Sources

arXiv cs.AI — 2026-05-01