LayerCache Framework Optimizes Flow Matching Inference Through Layer-Wise Caching

ai-technology · 2026-04-22

LayerCache, a novel caching framework, tackles the high computational demands associated with Flow Matching models used in image generation. While these models produce top-tier quality, they necessitate iterative denoising via extensive Transformer networks, which is costly. The study, available in arXiv preprint 2604.16492v1, reveals that various layer groups within a Transformer display differing velocity dynamics. Shallow layers are stable enough for aggressive caching, whereas deep layers undergo significant velocity shifts that require complete computation. Current caching techniques view the Transformer as a single entity, making uniform caching decisions per timestep and overlooking layer-specific differences. LayerCache divides the Transformer into groups, enabling independent caching decisions for each group at every denoising stage. It also features an adaptive JVP span K selection method that leverages stability metrics to optimize both accuracy and computational efficiency, aiming to lower inference costs without sacrificing image quality.

Key facts

Flow Matching models achieve state-of-the-art image generation quality.
These models incur substantial inference costs due to iterative denoising through large Transformer networks.
Different Transformer layer groups exhibit markedly heterogeneous velocity dynamics.
Shallow layers are highly stable and amenable to aggressive caching.
Deep layers undergo large velocity changes that demand full computation.
Existing caching methods treat the entire Transformer as a monolithic unit with a single caching decision per timestep.
LayerCache is a layer-aware caching framework that partitions the Transformer into layer groups.
LayerCache makes independent, per-group caching decisions at each denoising step.
LayerCache introduces an adaptive JVP span K selection mechanism leveraging per-group stability measurements.
The research is documented in arXiv preprint 2604.16492v1 as a cross-announcement.

LayerCache Framework Optimizes Flow Matching Inference Through Layer-Wise Caching

Key facts

Entities

Institutions

Sources