TensorFlow – Is MSE Loss a Valid ELBO Loss to Measure?

autoencoderstensorflow

I am learning from an example given by TensorFlow document, https://www.tensorflow.org/tutorials/generative/cvae#define_the_loss_function_and_the_optimizer:

VAEs train by maximizing the evidence lower bound (ELBO) on the
marginal log-likelihood.

In practice, optimize the single sample Monte Carlo estimate of this
expectation: logp(x|z) + logp(z) – logq(z|x).

The loss function was implemented as:

def log_normal_pdf(sample, mean, logvar, raxis=1):
  log2pi = tf.math.log(2. * np.pi)
  return tf.reduce_sum(
      -.5 * ((sample - mean) ** 2. * tf.exp(-logvar) + logvar + log2pi),
      axis=raxis)


def compute_loss(model, x):
  mean, logvar = model.encode(x)
  z = model.reparameterize(mean, logvar)
  x_logit = model.decode(z)
  cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)
  logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])
  logpz = log_normal_pdf(z, 0., 0.)
  logqz_x = log_normal_pdf(z, mean, logvar)
  return -tf.reduce_mean(logpx_z + logpz - logqz_x)

Since this example used MINIST dataset, x can be normalized to [0, 1] and sigmoid_cross_entropy_with_logits was used here.

My question:

Another example used MSE loss (as follow), is MSE loss a valid ELBO loss to measure p(x|z)? Can we use other loss functions as a reconstruction loss in VAE, such as Huber loss (https://en.wikipedia.org/wiki/Huber_loss)?

https://www.tensorflow.org/guide/keras/custom_layers_and_models#putting_it_all_together_an_end-to-end_example

    # Iterate over the batches of the dataset.
    for step, x_batch_train in enumerate(train_dataset):
        with tf.GradientTape() as tape:
            reconstructed = vae(x_batch_train)
            # Compute reconstruction loss
            loss = mse_loss_fn(x_batch_train, reconstructed)
            loss += sum(vae.losses)  # Add KLD regularization loss

Best Answer

The Kingma et al. paper is very readable, and a good place to start understanding how and why VAEs work. Kingma, Diederik P., and Max Welling. "Auto-encoding variational Bayes." arXiv preprint arXiv:1312.6114 (2013).

"Another example used MSE loss (as follow), is MSE loss a valid ELBO loss to measure p(x|z)?"

Yes, MSE is a valid ELBO loss; it's one of the examples used in the paper. the authors write

We let $p_\theta(\bf{x}|\bf{z})$ be a multivariate Gaussian (in case of real-valued data) or Bernoulli (in case of binary data) whose distribution parameters are computed from $\bf{z}$ with a MLP (a fully-connected neural network with a single hidden layer, see appendix C).

In other words, we can use any $p_\theta(x|z)$ we like; we just need to implement a network to decode $z$ into $x$ and then measure the loss according to $p_\theta$.

Simple manipulations shows that minimizing MSE loss is the same as maximizing the joint probability of the gaussian density wrt the mean parameter. See: How do we get to the MSE in the loss function for a variational autoencoder?

"Can we use other loss functions as a reconstruction loss in VAE, such as Huber loss?"

Yes, choosing the Huber loss corresponds to replacing $p_\theta(\bf{x}|\bf{z})$ with another density, specifically the Huber density. For the Huber loss given by

$$ H_\alpha(x) = \begin{cases} \frac{1}{2} x^2 & | x | \le \alpha \\ \alpha \left(|x| - \frac{1}{2}\alpha \right) & | x | > \alpha \end{cases} $$ we can work backwards from the corresponding likelihood to show that the probability density implied by the Huber loss is given by

$$ p_\theta(y) \propto \exp \left(-H_\alpha(y)\right). $$

But knowing this fact isn't strictly necessary from a practical standpoint -- you can simply replace MSE with the Huber loss.