ARTFEED — Contemporary Art Intelligence

ML-Embed: Efficient Multilingual Text Embeddings

publication · 2026-05-16

A new research paper introduces ML-Embed, a suite of multilingual text embedding models designed to address three barriers: high computational costs, narrow linguistic focus, and lack of transparency. The models are built on a 3-Dimensional Matryoshka Learning (3D-ML) framework, which includes Matryoshka Representation Learning (MRL) for storage efficiency, Matryoshka Layer Learning (MLL) for flexible inference, and a new Matryoshka Embedding Learning (MEL) for parameter efficiency. The authors curated a massively multilingual dataset to train the models, aiming to make embeddings more inclusive and efficient for a wide range of languages.

Key facts

  • ML-Embed is a suite of inclusive and efficient text embedding models.
  • The models address prohibitive computational costs, narrow linguistic focus, and lack of transparency.
  • The framework is called 3-Dimensional Matryoshka Learning (3D-ML).
  • 3D-ML includes MRL, MLL, and the newly introduced MEL.
  • MEL enhances parameter efficiency.
  • A massively multilingual dataset was curated for training.
  • The paper is available on arXiv with ID 2605.15081.
  • The research aims to democratize high-quality embeddings for many languages.

Entities

Institutions

  • arXiv

Sources