First Comparative Study of DP, NER, and LLMs for Dutch Clinical Text De-identification
A new study from arXiv presents the first comparative evaluation of differential privacy (DP), named entity recognition (NER), and large language models (LLMs) for de-identifying Dutch clinical notes. The research, published as arXiv:2604.21421, assesses these methods individually and in hybrid combinations where NER or LLM preprocessing is applied before DP. Performance is measured in terms of privacy leakage and utility, aiming to balance formal privacy guarantees with practical usability under regulations like GDPR and HIPAA. The study addresses the high cost and slow pace of manual de-identification, which remains the gold standard but is impractical for large-scale secondary use of healthcare data. Automated pipelines typically rely on NER to identify protected entities for redaction, while DP offers formal privacy guarantees. LLMs have recently emerged as a tool for clinical text de-identification. This work systematically compares all three approaches on Dutch-language data, filling a gap in non-English clinical NLP. The findings are expected to inform the development of more efficient and privacy-preserving automated de-identification systems for Dutch healthcare records.
Key facts
- First comparative study of DP, NER, and LLMs for Dutch clinical text de-identification
- Published as arXiv:2604.21421
- Evaluates methods individually and in hybrid strategies
- Hybrid strategies apply NER or LLM preprocessing before DP
- Performance assessed by privacy leakage and utility
- Motivated by GDPR and HIPAA compliance
- Manual de-identification is gold standard but costly and slow
- Study focuses on Dutch-language clinical notes
Entities
Institutions
- arXiv
- GDPR
- HIPAA