ARTFEED — Contemporary Art Intelligence

ORPHEAS: Bilingual Greek-English Embedding Model for RAG

other · 2026-04-24

Researchers propose ORPHEAS, a specialized Greek-English embedding model for bilingual retrieval-augmented generation (RAG). Existing multilingual models fail to optimize for Greek due to its morphological complexity and domain-specific terminology. ORPHEAS is trained on a high-quality dataset generated via a knowledge graph-based fine-tuning methodology applied to a diverse multi-domain corpus, enabling language-agnostic semantic representations. Numerical experiments show ORPHEAS outperforms state-of-the-art models on monolingual and cross-lingual retrieval benchmarks. The work addresses a gap in cross-lingual NLP for Greek-English applications.

Key facts

  • ORPHEAS is a Greek-English embedding model for bilingual RAG.
  • Existing multilingual models are suboptimal for Greek due to morphological complexity.
  • Training uses a knowledge graph-based fine-tuning methodology.
  • Dataset is generated from a diverse multi-domain corpus.
  • ORPHEAS enables language-agnostic semantic representations.
  • Outperforms state-of-the-art on retrieval benchmarks.
  • Addresses cross-lingual NLP gap for Greek-English.
  • Published on arXiv with ID 2604.20666.

Entities

Institutions

  • arXiv

Sources