Swift Sampling: AI Frame Selection via Taylor Series
A new training-free algorithm called Swift Sampling selects high-information frames from long-form video by modeling visual features as a differentiable trajectory. It computes velocity and acceleration in latent space, then uses Taylor expansion to predict frame evolution. Frames that diverge from this predicted manifold are identified as temporally surprising and sampled. The method adds only 0.02x computational overhead over baseline, requiring no auxiliary networks or video-specific tuning. Inspired by predictive coding in the human brain, it targets moments where actual features deviate from expected evolution.
Key facts
- Swift Sampling is a training-free frame selection algorithm.
- It models video as a differentiable trajectory in visual latent space.
- It computes velocity and acceleration of visual features.
- Taylor expansion projects the expected path of subsequent frames.
- Frames diverging from the predicted manifold are selected as temporally surprising.
- The method adds only 0.02x additional computational cost.
- It requires no auxiliary networks or video-specific hyperparameter tuning.
- The algorithm is inspired by the human brain's predictive coding.
Entities
—