Φ-Noise: Training-Free Temporal Video Conditioning via Phase-Based Noise Manipulation
Researchers have unveiled a groundbreaking method known as Φ-Noise, designed to create motion-conditioned videos without any prior training. This approach utilizes low-frequency phase information from a reference video, embedding it into the diffusion noise latents. As a result, it effectively transmits motion cues while keeping the model’s structure and inference unchanged, achieving outcomes comparable to more complex conditioning methods. This advancement draws inspiration from studies that emphasize the importance of frequency components in generative models, demonstrating its capability to handle both visual details and motion in videos across different uses.
Key facts
- Φ-Noise is a training-free approach for motion-conditioned video generation.
- It injects low-frequency phase information from a reference video into diffusion noise latents.
- The method does not modify model architecture or inference pipeline.
- It achieves competitive or superior results compared to more complex conditioning approaches.
- The approach is motivated by findings on the importance of frequency components in generative models.
- It demonstrates effective control over both appearance and dynamics in generated videos.
- Several applications are used to demonstrate the method's effectiveness.
- The paper is categorized under Computer Vision and Pattern Recognition.
Entities
Institutions
- arXiv