Semantic Search Deployed Across 166M Clinical Notes at Children's Hospital
At a major children's hospital, researchers implemented a semantic search system that indexed clinical notes from 1.68 million patients, totaling 166 million notes (484 million vectors). This system utilizes instruction-tuned qwen3-embedding-0.6B embeddings and organizes vectors in a managed database with storage-optimized indexing, while ensuring full-text metadata is kept in a low-latency key-value store, all within a HIPAA-compliant governance framework. The effectiveness of this system was assessed through three experiments aimed at refining the embedding model and chunking strategy, using queries generated by physicians. This initiative tackles engineering, cost, and governance issues that have hindered the widespread use of semantic search in health systems.
Key facts
- 166 million clinical notes indexed
- 484 million vectors
- 1.68 million patients
- Uses qwen3-embedding-0.6B embeddings
- HIPAA-compliant governance framework
- Three experiments conducted
- Physician-authored queries used for evaluation
- Deployed at a large children's hospital
Entities
Institutions
- arXiv