Math Takes Two: Benchmark Tests Emergent Mathematical Reasoning in Language Models

ai-technology · 2026-04-27

A novel benchmark named Math Takes Two has been introduced to evaluate the ability of language models to cultivate mathematical reasoning from foundational principles via communication. In contrast to current assessments that depend on symbolic problems rooted in established mathematical norms, this benchmark examines two agents lacking prior mathematical knowledge as they create a shared symbolic protocol to tackle a visually grounded task, where the use of numerical systems aids in extrapolation. This initiative is driven by the theory that human mathematical thinking evolved alongside the necessity for precise communication. The benchmark seeks to differentiate genuine mathematical reasoning from mere statistical pattern recognition based on formal syntax. The paper can be found on arXiv with the identifier 2604.21935.

Key facts

Math Takes Two is a new benchmark for emergent mathematical reasoning
Tests two agents without prior mathematical knowledge
Agents must develop a shared symbolic protocol
Task is visually grounded and requires numerical system use
Motivated by co-evolution of mathematical cognition and communication
Aims to distinguish reasoning from pattern matching
Published on arXiv with ID 2604.21935
Challenges existing evaluations based on symbolic problems

Math Takes Two: Benchmark Tests Emergent Mathematical Reasoning in Language Models

Key facts

Entities

Institutions

Sources