GPT-Image-2 Twitter Dataset Tracks AI Imagery After OpenAI Release
The GPT-Image-2 Twitter Dataset has been unveiled by researchers, marking the first-ever compilation of images produced by OpenAI's GPT-image-2 model. This dataset is derived from public posts on Twitter/X following the model's launch on April 21, 2026. By utilizing the Twitter API v2 and a comprehensive curation process that incorporates multilingual text heuristics (in English, Japanese, and Chinese), automated browser checks for Twitter's "Made with AI" label, and matching model name variations, the team successfully gathered 10,217 verified GPT-image-2 images from a total of 27,662 entries within a span of six days. The dataset includes analyses on CLIP-based zero-shot subject classification, text legibility (with 82.0% of images featuring recognizable text), and face detection (covering 59.2% of images, totaling 22,583 faces). This release signifies a pivotal moment in the realm of AI-generated visuals, as distinguishing between real photographs and synthetic images becomes increasingly challenging.
Key facts
- Dataset sourced from Twitter/X posts after GPT-image-2 release on April 21, 2026
- 10,217 confirmed GPT-image-2 images from 27,662 records over six days
- Multi-stage curation: multilingual heuristics (English, Japanese, Chinese), badge verification, model name matching
- 82.0% of images contain detectable text (OCR analysis)
- 59.2% of images contain faces (22,583 total faces)
- CLIP-based zero-shot subject taxonomy applied
- First published dataset of GPT-image-2 generated images
- Boundary between photographic reality and synthetic content increasingly difficult to discern
Entities
Institutions
- OpenAI
- X
- arXiv