AsymK-Talker: Real-Time Talking Head Generation via Asymmetric Kernel Distillation
AsymK-Talker is an innovative method that combines diffusion and distillation for generating talking heads in real-time and over extended periods, as detailed in a paper on arXiv (2605.02948). This technique overcomes three significant challenges faced by current diffusion methods: inefficiency in causal inference, lack of compatibility with temporally coherent conditioning, and gradual drift in lengthy sequences. It consists of three main elements: Kernel-Conditioned Loop Generation (KCLG), which employs motion kernels for consistent temporal propagation; Temporal Reference Encoding (TRE), which transforms a static identity reference into a time-sensitive latent representation for better audio-visual synchronization; and an asymmetric kernel distillation approach. This method allows for real-time audio-driven talking head generation with enhanced temporal coherence and stability over long durations.
Key facts
- AsymK-Talker is a diffusion-distillation method for talking head generation
- Addresses causal inefficiency, temporally coherent conditioning incompatibility, and progressive drift
- Uses Kernel-Conditioned Loop Generation (KCLG) for chunk-wise generation
- Employs Temporal Reference Encoding (TRE) for audio-visual synchronization
- Published on arXiv with ID 2605.02948
- Focuses on real-time and long-horizon generation
- Announce type is cross
- Leverages motion kernels for temporal consistency
Entities
Institutions
- arXiv