ARTFEED — Contemporary Art Intelligence

Ukrainian Handwritten Text Dataset and Cross-Domain Style Transfer Model

digital · 2026-05-28

A new diffusion-based model has been created for producing handwritten Ukrainian text, filling a void in the generation of non-Latin script handwriting. The researchers compiled a dataset consisting of 126,177 images from 308 different writers, employing connected-component segmentation and quality filtering, while also focusing on oversampling less common Ukrainian characters. Named DiffusionPen, the model incorporates a MobileNetV2 triplet-loss style encoder along with a CANINE-conditioned latent diffusion U-Net, retrained on this dataset without any changes to its architecture. The research examines cross-domain style transfer across three scenarios: cross-lingual transfer from IAM English samples, zero-shot transfer, and fine-tuning, assessing the generalization capabilities of existing models beyond Latin scripts. The findings are available on arXiv under ID 2605.27487.

Key facts

  • Dataset of 126,177 Ukrainian handwritten word images from 308 writers
  • Uses DiffusionPen model with MobileNetV2 triplet-loss style encoder and CANINE-conditioned latent diffusion U-Net
  • Tests cross-domain style transfer from Latin to Cyrillic in three settings
  • Addresses underexplored non-Latin handwriting generation for Ukrainian
  • Published on arXiv with ID 2605.27487

Entities

Institutions

  • arXiv

Sources