JMed48k: Japanese Medical Licensing Benchmark for VLMs

ai-technology · 2026-05-23

A new benchmark called JMed48k has been developed by researchers to assess vision-language models in the context of Japanese healthcare licensing. This dataset, derived from official PDF documents from Japan's Ministry of Health, Labour and Welfare, comprises 48,862 exam questions and 20,142 images sourced from 11 national licensing exams conducted between 2005 and 2025, with visual elements categorized using an 8-type taxonomy. Additionally, a subset named JMed48k-Eval features 12,484 scored questions from the past five years, including 9,905 text-only questions and 2,579 with images. The research team analyzed 21 models—both proprietary and open-source—reporting their performance on text-only and image-inclusive queries. A paired image-removal audit assessed four states of answer transition by comparing questions with and without images.

Key facts

JMed48k contains 48,862 exam questions and 20,142 images from 11 Japanese national licensing examinations (2005–2025).
Built from official PDF materials released by the Japanese Ministry of Health, Labour and Welfare.
Visual content is annotated under an 8-type taxonomy.
JMed48k-Eval subset has 12,484 scored questions: 9,905 text-only and 2,579 with images.
21 models (proprietary, open-source, medical-specific) were evaluated.
A paired image-removal audit was conducted to study four answer-transition states.
The benchmark is designed for vision-language model evaluation in Japanese healthcare.
Performance is reported separately for text-only and with-image questions.

Entities

Institutions

Japanese Ministry of Health, Labour and Welfare
arXiv

Locations

Japan

Sources

arXiv cs.AI — 2026-05-23