Solved – Variational autoencoder: Why reconstruction term is same to square loss

autoencodersdeep learninginferenceprobabilityvariational-bayes

In variational autoencoder (see paper), page 5, the loss function for neural networks is defined as:

$L(\theta;\phi;x^{i})\backsimeq 0.5*\sum_{j=1}^J(1 + 2\log\sigma^i_j-(\mu^i)^2) – (\sigma^i)^2) + \frac{1}{L}\sum_{l=1}^L \log p_\theta(x^i|z^{i,l})$

While in the code, the second term $\frac{1}{L}\sum_{l=1}^L \log p_\theta(x^i|z^{i,l})$ is actually achieved by:
binary_crossentropy(x, x_output), where x and x_output is input and output of autoencoder respectively.

My question is why are the losses of input and output is equivelant to $\frac{1}{L}\sum_{l=1}^L \log p_\theta(x^i|z^{i,l})$?

Best Answer

For regular Autoencoders, you start from an input, $x$ and encode it to obtain your latent variable (or code), $z$, using some function that satisfy: $z=f(x)$. After getting the latent variable, you aim to reconstruct the input using some other function $\hat{x}=g(f(x))$. The reconstruction loss is yet another function $L(x,\hat{x})$ that you use to back-propagate and update $f$ and $g$.

For Variational Autoencoders, you still interpret the latent variables, $z$, as your code. Hence, $p(x|z)$ serves as a probabilistic decoder, since given a code $z$, it produces a distribution over the possible values of $x$. It thus "makes sense" that the term $\log p_{\theta}(x|z)$ is somehow connected to reconstruction error.

Both encoder and decoder are deterministic functions. Since $p(x|z)$ is such function that maps $z$ into $\hat{x}$ , you can think of this expression as $p(x|\hat{x})$. When you assume (as they assumed in the paper if I understood it correctly) that this distribution have a Gaussian form: $$ \log P(x|\hat{x}) \sim \log e^{-|x-\hat{x}|^2} \sim (x-\hat{x})^2 $$

The last expression is proportional to the reconstruction error in regular autoencoders.

Related Question