ARTFEED — Contemporary Art Intelligence

MONET: Open-Source Dataset of 104.9M Image-Text Pairs Released

digital · 2026-05-22

MONET, a newly launched open dataset under the Apache 2.0 license, consists of around 104.9 million image-text pairs sourced from 2.9 billion raw pairs obtained from diverse open platforms. This dataset has undergone multiple safety and domain-based filtering processes, as well as the removal of exact and near-duplicates, and has been re-captioned using various vision-language models that range from short to long descriptions. Additionally, it includes synthetically generated samples. Each image is accompanied by pre-computed embeddings and annotations to facilitate downstream applications. To test MONET, a latent diffusion model with 4 billion parameters trained solely on this dataset achieved notable GenEval and DPG scores, promoting open and reproducible research in text-to-image generation.

Key facts

  • MONET dataset contains ~104.9M image-text pairs
  • Sourced from 2.9B raw pairs across heterogeneous open sources
  • Includes safety filtering, domain filtering, deduplication, and re-captioning
  • Re-captioned with multiple vision-language models
  • Augmented with synthetically generated samples
  • Each image has pre-computed embeddings and annotations
  • A 4B-parameter latent diffusion model trained on MONET achieved competitive GenEval and DPG scores
  • Dataset released under Apache 2.0 license

Entities

Sources