Real-Time Diffusion Model Inference Optimized on Apple M3 Ultra

ai-technology · 2026-05-20

A team of researchers has made strides in real-time image transformation using a camera powered by Apple's M3 Ultra chip, which boasts a 60-core GPU and 512 GB of unified memory. They refined the diffusion model inference through a comprehensive 10-step process. Their exploration included techniques like CoreML conversion, quantization, Token Merging, and utilizing the Neural Engine, among others. They also looked into compact models, frame interpolation, kNN search synthesis, pix2pix-turbo, optical flow frame skipping, and knowledge distillation. In the end, they combined CoreML conversion with a distillation-focused model called SDXS-512 and a 3-thread camera setup, achieving 22.7 FPS at 512x512 resolution. This work highlights optimization for non-CUDA platforms like Apple Silicon.

Key facts

Target platform: Apple M3 Ultra (60-core GPU, 512 GB unified memory)
Goal: real-time camera img2img transformation
10 optimization phases explored
Techniques included CoreML conversion, quantization, Token Merging, Neural Engine, compact models, frame interpolation, kNN search, pix2pix-turbo, optical flow frame skipping, knowledge distillation
Final model: SDXS-512 with CoreML conversion and 3-thread camera pipeline
Achieved 22.7 FPS at 512x512 resolution
Research published on arXiv (2605.16259)
Addresses gap in optimization for non-CUDA platforms

Real-Time Diffusion Model Inference Optimized on Apple M3 Ultra

Key facts

Entities

Institutions

Sources