Solved – VAE: why we do not sample again after decoding and before reconstruction loss

autoencodersmachine learningvariational-bayes

In many of the VAE schematics and in the original paper, a sampling step is present after decoding and before the reconstruction loss as shown in the image below. The image comes from Stanford CS321n.

VAE scheme

In many of the code implementations though, this step is not present. For example in the Keras implementation available here: https://keras.io/examples/variational_autoencoder/

In the latent space z they sample with the Lambda layer, but at the end of the decoder there is just a Dense layer with a sigmoid activation.

Is the sigmoid doing something I don't understand mathematically? Is the VAE math still valid without this sampling step?

It is not only in code implementations, in some other schematics and textual material it seems to be ignored (see next image).

Second VAE scheme with no sampling

Best Answer

The most important point stems from the confusion that the tilde $\sim$ implies a sampling operation. But $\sim$ does not imply that something is sampled, which is an algorithmic/computational operation. It indicates that something is distributed according to some distribution.

Now, when we train a VAE, we want to get gradients of the ELBO. The form of the ELBO used in VAEs is typically

$$\mathcal{L} = \mathbb{E}_{z \sim q}\left[ \log p(x|z) \right] - \mathop{KL}\left[ q(z|x) || p(z)\right].$$ In its vanilla form, the KL of the VAE's ELBO can be computed efficiently with Monte Carlo estimates from $q$.

The first term, the reconstruction term or likelihood term, can often be computed in closed form if $z$ is given. Especially in the two most prevalent cases–that of a Bernoulli and that of a Gaussian log likelihood.

Hence, if $x|z \sim D$ with $D$ being some tractable distribution, there is no need to sample from it, as what we are interested in is $\log p(x|z)$, which is often tractable by itself.

Related Question