LUNGUAGE: Benchmark for Structured Chest X-ray Interpretation

other · 2026-04-30

A new benchmark dataset named LUNGUAGE has been launched by researchers to facilitate structured radiology report generation, allowing for both individual report assessment and patient-level evaluations over time across various studies. This dataset includes 1,473 chest X-ray reports that have been annotated and examined by specialists, with 186 featuring longitudinal annotations to track disease evolution and intervals between studies. To create detailed, schema-aligned structured reports, a two-stage structuring framework is employed for the generated reports. Additionally, the researchers have introduced LUNGUAGESCORE, a clear metric designed for evaluation purposes.

Key facts

LUNGUAGE is a benchmark dataset for structured radiology report generation.
It supports single-report evaluation and longitudinal patient-level assessment.
Contains 1,473 annotated chest X-ray reports reviewed by experts.
186 reports have longitudinal annotations for disease progression.
A two-stage structuring framework transforms reports into structured formats.
LUNGUAGESCORE is an interpretable metric for evaluation.
The dataset addresses limitations of existing coarse metrics.
Focuses on fine-grained clinical semantics and temporal dependencies.

LUNGUAGE: Benchmark for Structured Chest X-ray Interpretation

Key facts

Entities

Institutions

Sources