Neural Networks – Why in Variational Auto Encoder (Gaussian variational family) model $\log\sigma^2$ not $\sigma^2$ or $\sigma$?

generative-modelsneural networksvariational-bayes

In theory the encoder in VAE (assuming that variational family is Gaussian) generates the $\mu$ and $\sigma$ (or $\sigma^2$). But, in practice, I have seen people assuming the output is $\log\sigma^2$. Why this is necessary or useful?

Best Answer

it brings stability and ease of training. by definition sigma has to be a positive real number. one way to enforce this would be to use a ReLU funtion to obtain its value, but the gradient is not well defined around zero. in addition, the standard deviation values are usually very small 1>>sigma>0. the optimization has to work with very small numbers, where the floating point arithmetic and the poorly defined gradient bring numerical instabilities.

if you use the log transform, you map the numerically unstable very small numbers in [1,0] interval to [log(1), -inf], where you have a lot more space to work with. calculating log and exp are numerically stable and easy, so you basically gain space where your optimization variable can move within.

please do not confuse: people do not use the log(sigma) value as the sigma value, but always transform it back to the original space. also in VAEs, you need the log(sigma) value in the Kullback-Leibler divergence term, so you need to calculate it anyways...