BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking
BioELX is a two-stage cross-lingual biomedical entity linking (BEL) framework that requires no task-specific annotated training data. In Stage 1, it enriches SapBERT training with multilingual aliases from Wikidata to improve candidate retrieval for non-English mentions. Stage 2 uses a pre-trained LLM ranker for context-aware disambiguation. The approach addresses the high cost of expert-annotated data and poor generalization of existing systems to low-resource languages.
Key facts
- BioELX is a cross-lingual BEL framework.
- It has two stages: alias-based retrieval and LLM ranking.
- Stage 1 enriches SapBERT with Wikidata multilingual aliases.
- Stage 2 uses a pre-trained LLM for disambiguation.
- No task-specific annotated training data is required.
- It targets low-resource languages.
- Existing systems rely on English aliases in KB.
- The paper is on arXiv with ID 2605.27380.
Entities
Institutions
- arXiv
- Wikidata