TIF-GRPO: Trajectory-Integral Feedback for CT Analysis
A new reinforcement learning method, Trajectory-Integral Feedback GRPO (TIF-GRPO), addresses evaluation hallucinations in medical vision-language models for 3D CT analysis. The approach uses the Clinical Abnormality Benchmarking Substrate (CABS) to decompose radiology reports into verifiable clinical units, correcting the mechanistic divergence where surface-similarity rewards bypass medical facts. The work is published on arXiv (2605.20277) and targets improving diagnostic accuracy in volumetric CT analysis.
Key facts
- TIF-GRPO is a new reinforcement learning method for medical VLMs.
- It addresses evaluation hallucinations in 3D CT analysis.
- CABS decomposes radiology reports into verifiable clinical semantic units.
- Standard RL suffers from mechanistic divergence, optimizing fluency over clinical correctness.
- The paper is available on arXiv with ID 2605.20277.
- The method aims to improve diagnostic accuracy in CT analysis.
Entities
Institutions
- arXiv