RoMathExam Dataset Compiles 130 Years of Romanian Mathematics Exams for AI Research
The RoMathExam dataset offers a comprehensive archive of Romanian high-school mathematics exams spanning from 1895 to 2025. It comprises over 10,592 math problems, organized into more than 600 complete exam sets across tracks M1 to M4. This resource features a standardized core that covers seven decades, from 1957 to 2025, including both official national exams and training variants published by the ministry. Each problem is digitized with precision and follows a unified JSON schema for traceable provenance. Additionally, the dataset includes curriculum-aligned topic tags and dense text embeddings for variant detection, deduplication, and similarity retrieval. To fill the gap in historical psychometric data, researchers have introduced a validated complexity metric as a scalable intrinsic proxy, supporting AI in Education research with authentic assessment data.
Key facts
- Dataset spans 1895-2025 with core from 1957-2025
- Contains 10,592 mathematics problems
- Organized into 600+ complete exam sets
- Covers multiple tracks (M1-M4)
- Includes official exams and training variants
- Features unified JSON schema with provenance
- Enriched with topic tags and text embeddings
- Proposes solution complexity metric
Entities
—