MedSAE Improves Interpretability of MedCLIP for Chest Radiographs

ai-technology · 2026-05-25

A team of researchers has introduced Medical Sparse Autoencoders (MedSAEs) to improve the interpretability of MedCLIP, a vision-language model specifically trained on chest radiographs and their corresponding reports. This research, which appears on arXiv, suggests an evaluation framework that integrates correlation metrics, entropy analyses, and automated neuron naming through the MedGemma foundation model. Tests conducted on the CheXpert dataset reveal that neurons from MedSAE provide greater monosemanticity and interpretability compared to the original MedCLIP features. This advancement connects high-performing medical AI with transparency, representing a scalable progression toward clinically trustworthy representations. The source code is accessible to the public.

Key facts

MedSAE stands for Medical Sparse Autoencoders.
MedCLIP is a vision-language model trained on chest radiographs and reports.
The evaluation framework uses correlation metrics, entropy analyses, and automated neuron naming via MedGemma.
Experiments were conducted on the CheXpert dataset.
MedSAE neurons show higher monosemanticity and interpretability than raw MedCLIP features.
The research aims to improve transparency in medical AI.
The source code is available at the provided URL.
The paper is categorized under Computer Science > Artificial Intelligence.

MedSAE Improves Interpretability of MedCLIP for Chest Radiographs

Key facts

Entities

Institutions

Sources