Self-Supervised Learning of Video Speed Perception and Control

ai-technology · 2026-04-25

A new paper on arXiv (2604.21931) introduces self-supervised models for perceiving and controlling the flow of time in videos. The researchers exploit multimodal cues and temporal structure to detect speed changes and estimate playback speed without labeled data. Using these models, they curated the largest slow-motion video dataset from noisy real-world sources. The work also enables temporal control, allowing generation of videos at different speeds.

Key facts

Paper on arXiv: 2604.21931
Self-supervised learning of speed changes and playback speed estimation
Curated largest slow-motion video dataset from in-the-wild sources
Models enable temporal control for video generation at different speeds

Entities

—

Sources

arXiv cs.AI — 2026-04-25