Solved – Why do we use Gaussian distributions in Variational Autoencoder

gaussian mixture distributionneural networksnormal distributionvariational-bayesweights

I still don't understand why we force the distribution of the hidden representation of a Variational Autoencoder (VAE) to follow a multivariate normal distribution. Why this specific distribution and not another one ?

This is maybe linked with another question : Why is the weights distribution in a neural network following a Gaussian Distribution ? Is it just the application of the Central Limit theorem that tells you that many independent inputs will generate many independent errors, and the observed weights are the results of these multiple back-propagated signals… ?

Best Answer

Normal distribution is not the only distribution used for latent variables in VAEs. There are also works using von Mises-Fisher distribution (Hypershperical VAEs [1]), and there are VAEs using Gaussian mixtures, which is useful for unsupervised [2] and semi-supervised [3] tasks.

Normal distribution has many nice properties, such as analytical evaluation of the KL divergence in the variational loss, and also we can use the reparametrization trick for efficient gradient computation (however, the original VAE paper [4] names many other distributions for which that works). Moreover, one of the apparent advantages of VAEs is that they allow generation of new samples by sampling in the latent space—which is quite easy if it follows Gaussian distribution. Finally, as @shimao remarked, it does not matter so much what distribution latent variables follow since using the non-linear decoder it can mimic arbitrarily complicated distribution of observations. It is simply convenient.

As for the second question, I agree with @shimao's answer.


[1]: Davidson, T.R., Falorsi, L., De Cao, N., Kipf, T. and Tomczak, J.M., 2018. Hyperspherical variational auto-encoders. arXiv preprint arXiv:1804.00891.

[2]: Dilokthanakul, N., Mediano, P.A., Garnelo, M., Lee, M.C., Salimbeni, H., Arulkumaran, K. and Shanahan, M., 2016. Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648.

[3]: Kingma, D.P., Mohamed, S., Rezende, D.J. and Welling, M., 2014. Semi-supervised learning with deep generative models. In Advances in neural information processing systems (pp. 3581-3589).

[4]: Kingma, D.P. and Welling, M., 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

Related Question