DALPHIN Benchmark Tests AI Copilots Against Pathologists

other · 2026-05-07

The recently introduced open benchmark, DALPHIN, assesses AI copilots in digital pathology by comparing them to human pathologists. This dataset comprises 1,236 images derived from 300 cases, encompassing 130 diagnoses across 14 subspecialties and six countries. The performance of human pathologists was evaluated with the involvement of 31 specialists from 10 different nations. Three AI copilots were examined: GPT-5, Gemini 2.5 Pro, and PathChat+. PathChat+ demonstrated no statistically significant difference from expert performance in four out of six tasks, while Gemini matched in two and GPT in one. The DALPHIN benchmark is now available to the public.

DALPHIN Benchmark Tests AI Copilots Against Pathologists

Key facts

Entities

Sources