Self-Supervised Learning of Video Speed Perception and Control
A new paper on arXiv (2604.21931) introduces self-supervised models for perceiving and controlling the flow of time in videos. The researchers exploit multimodal cues and temporal structure to detect speed changes and estimate playback speed without labeled data. Using these models, they curated the largest slow-motion video dataset from noisy real-world sources. The work also enables temporal control, allowing generation of videos at different speeds.
Key facts
- Paper on arXiv: 2604.21931
- Self-supervised learning of speed changes and playback speed estimation
- Curated largest slow-motion video dataset from in-the-wild sources
- Models enable temporal control for video generation at different speeds
Entities
—