CausalFlow-T: Estimating Treatment Effects from Incomplete EHR Data
A new two-stage pipeline, CausalFlow-T, addresses treatment effect estimation from incomplete longitudinal electronic health records (EHRs). The method combines a directed acyclic graph (DAG)-constrained normalizing flow with LSTM-encoded patient history to perform exact invertible counterfactual inference, avoiding approximation errors from variational inference. It separates confounding through explicit causal structure, targeting the high missingness (50%–80%) of missing-not-at-random (MNAR) biomarkers common in EHRs. The approach is validated on four synthetic and one semi-synthetic benchmark with known counterfactuals, showing that DAG constraints improve robustness. The work is published on arXiv (2605.05125) and targets target trial emulation (TTE) for causal questions when randomized controlled trials are infeasible.
Key facts
- CausalFlow-T is a two-stage pipeline for treatment effect estimation from incomplete longitudinal EHRs.
- It uses a DAG-constrained normalizing flow with LSTM-encoded patient history.
- The method performs exact invertible counterfactual inference.
- It avoids approximation errors from variational inference.
- The approach separates confounding through explicit causal structure.
- MNAR biomarkers in EHRs can reach 50%–80% missingness.
- Validation was done on four synthetic and one semi-synthetic benchmark.
- The paper is published on arXiv with ID 2605.05125.
Entities
Institutions
- arXiv