LLMs for Clinical Trial Screening: RAG Improves Evidence Localization in EHR Narratives

other · 2026-04-30

A recent study available on arXiv (2604.05190) investigates the potential of large language models (LLMs) to streamline the process of patient screening from longitudinal electronic health record (EHR) narratives for the purpose of clinical trial recruitment. This research tackles the significant challenge of under-enrollment by evaluating both encoder- and decoder-based generative LLMs, including those tailored for medical use. To address the 'Lost in the Middle' issue in lengthy documents, three approaches are examined: original long-context windows, NER-based extractive summarization, and retrieval-augmented generation (RAG) for real-time evidence retrieval aligned with eligibility criteria. The evaluation utilizes the 2018 N2C2 Track 1 benchmark dataset. This work was published on arXiv under the identifier 2604.05190v2, classified as a replace-cross announcement.

Key facts

Study systematically explores LLMs for clinical trial screening from EHR narratives.
Compares encoder- and decoder-based generative LLMs.
Examines general-purpose and medical-adapted LLMs.
Addresses 'Lost in the Middle' issue in long documents.
Tests three strategies: original long-context, NER-based summarization, and RAG.
Uses 2018 N2C2 Track 1 benchmark dataset for evaluation.
Published on arXiv with ID 2604.05190v2.
Announcement type is replace-cross.

LLMs for Clinical Trial Screening: RAG Improves Evidence Localization in EHR Narratives

Key facts

Entities

Institutions

Sources