Score-based diffusion models provably learn low-dimensional data distributions
A recent theoretical study demonstrates that score-based diffusion models can effectively capture data distributions characterized by inherent low-dimensional structures, such as those found in natural images, using limited samples. This research sets finite-sample error bounds in Wasserstein-p distance for all p ≥ 1, relying solely on finite-moment assumptions for the target distribution μ, without needing conditions like compact support, manifold, or smooth density. When provided with n i.i.d. samples from μ with a finite q-th moment, the convergence rate is polynomially related to the intrinsic dimension instead of the ambient dimension, clarifying why diffusion models perform well on high-dimensional yet low-complexity data like images. These findings are valid under mild regularity conditions concerning the forward diffusion process and the data distribution, offering the first statistical assurances that account for the intrinsic low-dimensional structure prevalent in real-world datasets.
Key facts
- arXiv:2603.03700v2
- Score-based diffusion models
- Finite-sample error bounds in Wasserstein-p distance
- All p ≥ 1
- Finite-moment assumption on μ
- No compact-support, manifold, or smooth-density conditions
- Convergence rate depends on intrinsic dimension
- Mild regularity conditions on forward diffusion and data distribution
Entities
Institutions
- arXiv