First Comparative Study of DP, NER, and LLMs for Dutch Clinical Text De-identification

ai-technology · 2026-04-25

A new study from arXiv presents the first comparative evaluation of differential privacy (DP), named entity recognition (NER), and large language models (LLMs) for de-identifying Dutch clinical notes. The research, published as arXiv:2604.21421, assesses these methods individually and in hybrid combinations where NER or LLM preprocessing is applied before DP. Performance is measured in terms of privacy leakage and utility, aiming to balance formal privacy guarantees with practical usability under regulations like GDPR and HIPAA. The study addresses the high cost and slow pace of manual de-identification, which remains the gold standard but is impractical for large-scale secondary use of healthcare data. Automated pipelines typically rely on NER to identify protected entities for redaction, while DP offers formal privacy guarantees. LLMs have recently emerged as a tool for clinical text de-identification. This work systematically compares all three approaches on Dutch-language data, filling a gap in non-English clinical NLP. The findings are expected to inform the development of more efficient and privacy-preserving automated de-identification systems for Dutch healthcare records.

Key facts

First comparative study of DP, NER, and LLMs for Dutch clinical text de-identification
Published as arXiv:2604.21421
Evaluates methods individually and in hybrid strategies
Hybrid strategies apply NER or LLM preprocessing before DP
Performance assessed by privacy leakage and utility
Motivated by GDPR and HIPAA compliance
Manual de-identification is gold standard but costly and slow
Study focuses on Dutch-language clinical notes

First Comparative Study of DP, NER, and LLMs for Dutch Clinical Text De-identification

Key facts

Entities

Institutions

Sources