CT-FineBench: New Benchmark for Fine-Grained CT Report Evaluation

other · 2026-04-29

CT-FineBench is a diagnostic fidelity benchmark designed for fine-grained evaluation of CT report generation. It addresses the limitations of conventional metrics like lexical overlap or entity matching, which fail to capture granular diagnostic accuracy. Built from CT-RATE and Merlin datasets, the benchmark uses a meticulous Question-Answering (QA) process: it identifies key clinical attributes (e.g., location, size, margin) and transforms them into QA pairs that probe specific details grounded in gold standards. This approach aims to improve factual consistency in automated CT reporting.

Key facts

CT-FineBench is a benchmark for fine-grained evaluation of CT report generation.
It addresses limitations of conventional metrics like lexical overlap or entity matching.
Built from CT-RATE and Merlin datasets.
Uses a QA-based process to identify and structure clinical attributes.
Attributes include location, size, margin, and other disease-oriented details.
Questions probe specific clinical details grounded in gold standards.
Aims to improve diagnostic accuracy in automated CT reporting.
Focuses on factual consistency in generated reports.

Entities

—

Sources

arXiv cs.AI — 2026-04-28