ORPHEAS: Bilingual Greek-English Embedding Model for RAG

other · 2026-04-24

Researchers propose ORPHEAS, a specialized Greek-English embedding model for bilingual retrieval-augmented generation (RAG). Existing multilingual models fail to optimize for Greek due to its morphological complexity and domain-specific terminology. ORPHEAS is trained on a high-quality dataset generated via a knowledge graph-based fine-tuning methodology applied to a diverse multi-domain corpus, enabling language-agnostic semantic representations. Numerical experiments show ORPHEAS outperforms state-of-the-art models on monolingual and cross-lingual retrieval benchmarks. The work addresses a gap in cross-lingual NLP for Greek-English applications.

Key facts

ORPHEAS is a Greek-English embedding model for bilingual RAG.
Existing multilingual models are suboptimal for Greek due to morphological complexity.
Training uses a knowledge graph-based fine-tuning methodology.
Dataset is generated from a diverse multi-domain corpus.
ORPHEAS enables language-agnostic semantic representations.
Outperforms state-of-the-art on retrieval benchmarks.
Addresses cross-lingual NLP gap for Greek-English.
Published on arXiv with ID 2604.20666.

ORPHEAS: Bilingual Greek-English Embedding Model for RAG

Key facts

Entities

Institutions

Sources