LiveK12Bench: New Benchmark Tests LMMs on Real High School Exams
Researchers have introduced LiveK12Bench, a dynamic benchmark designed to evaluate large multimodal models (LMMs) on authentic high school-level examinations. Unlike static datasets prone to contamination, LiveK12Bench comprises over 2,000 verified questions from the latest real-world exam papers in Mathematics, Physics, Chemistry, and Biology. The benchmark features an automated pipeline for continuous updates, aiming to reflect genuine testing environments. This addresses limitations of existing benchmarks that are often restricted in modalities, disciplines, and evaluation criteria. The work is published on arXiv under identifier 2605.26781.
Key facts
- LiveK12Bench is a dynamic, holistic, multi-disciplinary benchmark for LMMs.
- It includes 2,000+ verified questions from real exam papers.
- Subjects covered: Mathematics, Physics, Chemistry, Biology.
- Designed to grow over time with automated updates.
- Addresses issues of static datasets and data contamination.
- Published on arXiv: 2605.26781.
- Aims to evaluate reasoning in realistic examination scenarios.
- Core innovation: automated pipeline for continuous expansion.
Entities
Institutions
- arXiv