FeatEHR-LLM: LLM-Based Feature Engineering for EHR Data

ai-technology · 2026-04-27

A new framework called FeatEHR-LLM uses large language models to generate clinically meaningful features from irregularly sampled electronic health record time series. The approach addresses challenges such as irregular observation intervals, variable measurement frequencies, and structural sparsity. To protect patient privacy, the LLM only accesses dataset schemas and task descriptions, not raw records. A tool-augmented generation mechanism allows the LLM to produce executable code for feature extraction that handles uneven patterns and informative sparsity. The framework is presented in a preprint on arXiv (2604.22534).

Key facts

FeatEHR-LLM leverages LLMs for feature engineering in EHR data
Addresses irregular observation intervals and variable measurement frequencies
LLM operates only on schemas and task descriptions to protect privacy
Tool-augmented generation produces executable feature-extraction code
Handles uneven observation patterns and informative sparsity
Preprint available on arXiv with ID 2604.22534
Existing automated methods lack clinical awareness or assume clean inputs
Framework targets real-world EHR data challenges

FeatEHR-LLM: LLM-Based Feature Engineering for EHR Data

Key facts

Entities

Institutions

Sources