Solved – Variational autoencoder with Gaussian mixture model

autoencodersneural networksvariational-bayes

A variational autoencoder (VAE) provides a way of learning the probability distribution $p(x,z)$ relating an input $x$ to its latent representation $z$. In particular, the encoder $e$ maps an input $x$ to a distribution on $z$. A typical encoder will output parameters $(\mu,\sigma)=e(x)$, representing the Gaussian distribution $\mathcal{N}(\mu,\sigma)$; this distribution is used as our approximation for $p(z|x)$.

Has anyone considered a VAE where the output is a Gaussian mixture model, rather than a Gaussian? Is this useful? Are there tasks where this is significantly more effective than a simple Gaussian distribution? Or does it provide little benefit?

Best Answer

Yes, it has been done. The following paper implements something of that form:

Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. Nat Dilokthanakul, Pedro A.M. Mediano, Marta Garnelo, Matthew C.H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan.

They experiment with using this approach for clustering. Each Gaussian in the Gaussian mixture corresponds to a different cluster. Because the Gaussian mixture is in the latent space ($z$), and there is a neural network connecting $z$ to $x$, this allows non-trivial clusters in the input space ($x$).

That paper also mentions the following blog post, which experiments with a different variation on that architecture: http://ruishu.io/2016/12/25/gmvae/

Thanks to shimao for pointing this out.

Related Question