Latent Space Probing for Adult Content Detection in Video Generative Models
Researchers propose a novel framework for detecting adult content in AI-generated videos by intercepting latent representations during the generation process. The method attaches lightweight classifiers to the CogVideoX video diffusion model, analyzing denoised latent representations in real time. A dataset of 11,039 ten-second video clips (5,086 violating, 5,953 non-violating) was constructed from adult websites and YouTube. Two probing classifier architectures were introduced and evaluated. This approach addresses limitations of existing methods that operate only on prompts or decoded pixel-space outputs, which miss internal representations.
Key facts
- Framework intercepts latent representations from CogVideoX video diffusion model.
- Lightweight classifiers attached for real-time adult content detection.
- Dataset of 11,039 ten-second clips: 5,086 violating, 5,953 non-violating.
- Clips sourced from adult websites and YouTube.
- Two probing classifier architectures introduced.
- Addresses blindness of existing methods to internal representations.
- Published on arXiv with ID 2605.00874.
- Proposed method operates during inference.
Entities
Institutions
- arXiv
- CogVideoX