Multimodal MRI and Tabular Data Synthesis via Diffusion Model

ai-technology · 2026-05-11

A team of researchers has introduced a multimodal latent diffusion model that integrates volumetric magnetic resonance imaging (MRI) alongside tabular clinical data in a unified latent space through cross-attention mechanisms. This model employs a variational autoencoder to merge the MRI and tabular data prior to diffusion-based synthesis, featuring distinct decoders for each type of data. The model was tested using information from the German National Cohort (NAKO Gesundheitsstudie), which includes over 10,000 participants. The resulting MRI volumes demonstrated anatomical accuracy and body composition aligned with the synthesized tabular data, encompassing attributes such as age, sex, body measurements, and ethnicity. Fréchet distance and precision-recall metrics were utilized for quantitative assessment.

Key facts

Multimodal latent diffusion model proposed for joint synthesis of MRI and tabular data
Cross-attention mechanism enables coherent joint representation learning
Variational autoencoder fuses modalities before diffusion
Separate decoders for MRI and tabular data reconstruction
Evaluated on German National Cohort (NAKO Gesundheitsstudie) with over 10,000 participants
Tabular features include age, sex, body measurements, ethnicity
Generated MRI volumes exhibit anatomical plausibility and consistent body composition
Quantitative evaluation uses Fréchet distance and precision-recall metrics

Entities

Institutions

German National Cohort (NAKO Gesundheitsstudie)

Locations

Germany

Sources

arXiv cs.AI — 2026-05-11