ORPHEAS: Bilingual Greek-English Embedding Model for RAG
Researchers propose ORPHEAS, a specialized Greek-English embedding model for bilingual retrieval-augmented generation (RAG). Existing multilingual models fail to optimize for Greek due to its morphological complexity and domain-specific terminology. ORPHEAS is trained on a high-quality dataset generated via a knowledge graph-based fine-tuning methodology applied to a diverse multi-domain corpus, enabling language-agnostic semantic representations. Numerical experiments show ORPHEAS outperforms state-of-the-art models on monolingual and cross-lingual retrieval benchmarks. The work addresses a gap in cross-lingual NLP for Greek-English applications.
Key facts
- ORPHEAS is a Greek-English embedding model for bilingual RAG.
- Existing multilingual models are suboptimal for Greek due to morphological complexity.
- Training uses a knowledge graph-based fine-tuning methodology.
- Dataset is generated from a diverse multi-domain corpus.
- ORPHEAS enables language-agnostic semantic representations.
- Outperforms state-of-the-art on retrieval benchmarks.
- Addresses cross-lingual NLP gap for Greek-English.
- Published on arXiv with ID 2604.20666.
Entities
Institutions
- arXiv