SANA-Streaming: Real-Time Video Editing on Consumer GPUs
SANA-Streaming is a cutting-edge system designed for high-quality, real-time video editing using regular consumer GPUs. It addresses key challenges like maintaining consistency over time and improving processing speed, which are crucial for things like gaming and live streaming. The framework has three main parts: first, a Hybrid Diffusion Transformer that boosts local modeling with selective softmax attention while keeping linear layers efficient; second, Cycle-Reverse Regularization, a novel training approach that maintains semantic coherence by predicting original frames from generated content, removing the need for paired long edits; and third, an Efficient System Co-design that combines fused GDN kernels with mixed-precision computations. You can find this research on arXiv under ID 2605.30409.
Key facts
- SANA-Streaming enables real-time streaming video-to-video editing on consumer GPUs.
- It uses a Hybrid Diffusion Transformer with softmax attention for local modeling.
- Cycle-Reverse Regularization improves temporal consistency without paired long edited videos.
- Efficient System Co-design includes fused GDN kernels and mixed-precision computation.
- The framework targets interactive applications like live broadcasting and gaming.
- The paper is available on arXiv with ID 2605.30409.
- The approach addresses both temporal consistency and inference throughput.
- It is a system-algorithm co-designed framework.
Entities
Institutions
- arXiv