Real-Time Diffusion Model Inference Optimized on Apple M3 Ultra
A team of researchers has made strides in real-time image transformation using a camera powered by Apple's M3 Ultra chip, which boasts a 60-core GPU and 512 GB of unified memory. They refined the diffusion model inference through a comprehensive 10-step process. Their exploration included techniques like CoreML conversion, quantization, Token Merging, and utilizing the Neural Engine, among others. They also looked into compact models, frame interpolation, kNN search synthesis, pix2pix-turbo, optical flow frame skipping, and knowledge distillation. In the end, they combined CoreML conversion with a distillation-focused model called SDXS-512 and a 3-thread camera setup, achieving 22.7 FPS at 512x512 resolution. This work highlights optimization for non-CUDA platforms like Apple Silicon.
Key facts
- Target platform: Apple M3 Ultra (60-core GPU, 512 GB unified memory)
- Goal: real-time camera img2img transformation
- 10 optimization phases explored
- Techniques included CoreML conversion, quantization, Token Merging, Neural Engine, compact models, frame interpolation, kNN search, pix2pix-turbo, optical flow frame skipping, knowledge distillation
- Final model: SDXS-512 with CoreML conversion and 3-thread camera pipeline
- Achieved 22.7 FPS at 512x512 resolution
- Research published on arXiv (2605.16259)
- Addresses gap in optimization for non-CUDA platforms
Entities
Institutions
- arXiv