Lumos-Nexus: Efficient Frequency Bridging for Video Unified Models
Researchers propose Lumos-Nexus, a training-efficient framework for video unified models that integrates reasoning-driven generation with high visual fidelity. The system uses a two-stage design: during training, a lightweight generator aligns with the understanding block to learn semantic control; during inference, Unified Progressive Frequency Bridging (UPFB) progressively hands off generation to a high-capacity pretrained generator in a shared latent space, enabling coarse-to-fine refinement without compromising reasoning. This approach addresses the computational bottleneck of integrating large high-fidelity generators into unified training loops. The paper is available on arXiv under identifier 2605.31603.
Key facts
- Lumos-Nexus is a training-efficient unified video generation framework.
- It uses a two-stage design: lightweight generator alignment during training, UPFB during inference.
- UPFB stands for Unified Progressive Frequency Bridging.
- The framework enables high-fidelity video generation without compromising reasoning.
- The paper is on arXiv with ID 2605.31603.
- The approach addresses computational limits of integrating large generators into unified training.
- The shared latent space enables coarse-to-fine refinement.
- The lightweight generator learns reasoning-driven semantic control.
Entities
Institutions
- arXiv