SignVerse-2M: Two Million Clips Covering 25+ Sign Languages for Pose-Based Recognition

digital · 2026-05-06

Researchers have introduced SignVerse-2M, a large-scale dataset containing two million video clips across more than 25 sign languages. Unlike existing resources that rely on RGB video-text alignment, SignVerse-2M provides pose-native keypoint representations using DWPose, enabling direct interface with modern pose-driven recognition and generation models. The dataset addresses the limitations of RGB-based models, which are sensitive to background and clothing variations, and fills a gap in resources for open-world sign language recognition and translation. The work is presented in a preprint on arXiv (2605.01720).

Key facts

SignVerse-2M contains two million video clips.
The dataset covers over 25 sign languages.
It uses DWPose keypoint representations.
It targets open-world recognition and translation.
RGB-based models are less robust in open settings.
The dataset is pose-native, not RGB-based.
It supports pose-driven video generation.
The preprint is on arXiv (2605.01720).

SignVerse-2M: Two Million Clips Covering 25+ Sign Languages for Pose-Based Recognition

Key facts

Entities

Institutions

Sources