BioHiCL AI Model Enhances Biomedical Information Retrieval Using Hierarchical Contrastive Learning
A new artificial intelligence model called BioHiCL has been developed to improve biomedical information retrieval by leveraging hierarchical MeSH annotations for structured supervision in multi-label contrastive learning. The system addresses limitations in existing biomedical generative retrievers that rely on coarse binary relevance signals, which fail to adequately capture semantic overlap between texts. Two versions of the model have been created: BioHiCL-Base with 0.1 billion parameters and BioHiCL-Large with 0.3 billion parameters. These models demonstrate promising performance across multiple biomedical tasks including retrieval, sentence similarity assessment, and question answering while maintaining computational efficiency for practical deployment. The research was published on arXiv, a platform for scientific preprints, under the computer science and information retrieval categories. The framework utilizes arXivLabs, which enables community collaborators to develop and share experimental features while adhering to values of openness, community excellence, and user data privacy. Biomedical information retrieval requires sophisticated modeling of domain-specific semantics and hierarchical relationships among texts, which BioHiCL addresses through its innovative approach to contrastive learning. The system's computational efficiency makes it suitable for real-world implementation in biomedical research and information systems.
Key facts
- BioHiCL uses hierarchical MeSH annotations for multi-label contrastive learning
- Two model versions exist: BioHiCL-Base (0.1B parameters) and BioHiCL-Large (0.3B parameters)
- The system improves biomedical information retrieval, sentence similarity, and question answering
- Existing biomedical retrievers use coarse binary relevance signals
- The research was published on arXiv under computer science/information retrieval
- arXivLabs framework allows community development of experimental features
- The models maintain computational efficiency for deployment
- Biomedical retrieval requires modeling domain semantics and hierarchical relationships
Entities
Institutions
- arXiv
- arXivLabs