ARTFEED — Contemporary Art Intelligence

Automated Dosing Error Detection in Clinical Trial Narratives Using LightGBM

other · 2026-04-24

A new automated system detects dosing errors in clinical trial narratives using gradient boosting and multi-modal feature engineering. The approach combines 3,451 features from NLP, semantic embeddings, medical patterns, and transformer models to train a LightGBM model on 42,112 narratives. On the CT-DEB benchmark, it achieves 0.8725 test ROC-AUC with 5-fold ensemble averaging.

Key facts

  • System uses LightGBM with 3,451 features
  • Features include TF-IDF, character n-grams, all-MiniLM-L6v2 embeddings, BiomedBERT, DeBERTa-v3
  • Trained on 42,112 clinical trial narratives
  • Achieves 0.8725 test ROC-AUC
  • Cross-validation AUC: 0.8833 ± 0.0091
  • Dataset has severe class imbalance (4.9% positive rate)
  • Features extracted from nine complementary text fields
  • Median 5,400 characters per sample

Entities

Sources