Machine Learning Models – Why De-Noising Diffusion Models Can Be Sampled with Gaussian Distributions

central limit theoremgaussian processmachine learningmathematical-statisticsrandom walk

In de-noising diffusion models 1 the latent is typically sampled with a unit normal distribution, and then the sample (e.g. image) is generated by iteratively removing noise during the backwards process. Whereas in the diffusion (forward) process, the random Gaussian latent is predicted by iteratively adding Gaussian noise to the original image. So is the implication that this iterative addition of gaussian noise to the image (in the forward process) eventually leads back to an approximately unit gaussian distribution for the resulting random variable?

In other words, assuming the sample/image distribution itself is not Gaussian, is there some reason to expect that adding a large number of gaussian distributions to an initial non-Gaussian random variable would lead back to a unit Gaussian distribution, at least in the limit of this Gaussian random walk?

Denoising Diffusion Probabilistic Models by Jonathan Ho, Ajay Jain, Pieter Abbeel

Best Answer

We can compute the distribution of the noisy sample after $t$ iterations of the forward process in closed form, using equation 4 in the paper:

$$q(\mathbf{x}_t|\mathbf{x}_0) = \mathcal N(\mathbf{x}_t|\sqrt{\bar \alpha_t} \mathbf{x}_0, (1 - \bar \alpha_t)\mathbf I)$$

where $\mathbf{x}_0$ is the original sample, $\bar \alpha_t = \prod_{i=1}^t (1 - \beta_t)$, and $\beta_t \in (0,1)$ is the fixed variance level in the $t$-th iteration of the forward process.

In DDPM, the number of iterations in the forward process is fixed to $T=1000$, and the variance levels $\beta_1, \dots, \beta_T$ increase linearly from $10^{-4}$ to $0.02$, which means that $\bar \alpha_t$ is (approximately) 0. Therefore, the latent distribution $q(\mathbf{x}_T|\mathbf{x}_0)$ is a unit Gaussian.