ARTFEED — Contemporary Art Intelligence

SignVerse-2M: Two Million Clips Covering 25+ Sign Languages for Pose-Based Recognition

digital · 2026-05-06

Researchers have introduced SignVerse-2M, a large-scale dataset containing two million video clips across more than 25 sign languages. Unlike existing resources that rely on RGB video-text alignment, SignVerse-2M provides pose-native keypoint representations using DWPose, enabling direct interface with modern pose-driven recognition and generation models. The dataset addresses the limitations of RGB-based models, which are sensitive to background and clothing variations, and fills a gap in resources for open-world sign language recognition and translation. The work is presented in a preprint on arXiv (2605.01720).

Key facts

  • SignVerse-2M contains two million video clips.
  • The dataset covers over 25 sign languages.
  • It uses DWPose keypoint representations.
  • It targets open-world recognition and translation.
  • RGB-based models are less robust in open settings.
  • The dataset is pose-native, not RGB-based.
  • It supports pose-driven video generation.
  • The preprint is on arXiv (2605.01720).

Entities

Institutions

  • arXiv

Sources