SignVerse-2M: Two Million Clips Covering 25+ Sign Languages for Pose-Based Recognition
Researchers have introduced SignVerse-2M, a large-scale dataset containing two million video clips across more than 25 sign languages. Unlike existing resources that rely on RGB video-text alignment, SignVerse-2M provides pose-native keypoint representations using DWPose, enabling direct interface with modern pose-driven recognition and generation models. The dataset addresses the limitations of RGB-based models, which are sensitive to background and clothing variations, and fills a gap in resources for open-world sign language recognition and translation. The work is presented in a preprint on arXiv (2605.01720).
Key facts
- SignVerse-2M contains two million video clips.
- The dataset covers over 25 sign languages.
- It uses DWPose keypoint representations.
- It targets open-world recognition and translation.
- RGB-based models are less robust in open settings.
- The dataset is pose-native, not RGB-based.
- It supports pose-driven video generation.
- The preprint is on arXiv (2605.01720).
Entities
Institutions
- arXiv