Solved – Choosing activation and loss functions in autoencoder

autoencoderskerasloss-functionsneural networks

I am following this keras tutorial to create an autoencoder using the MNIST dataset. Here is the tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.

However, I am confused with the choice of activation and loss for the simple one-layer autoencoder (which is the first example in the link). Is there a specific reason sigmoid activation was used for the decoder part as opposed to something such as relu? I am trying to understand whether this is a choice I can play around with, or if it should indeed be sigmoid, and if so why? Similarily, I understand the loss is taken by comparing each of the original and predicted digits on a pixel-by-pixel level, but I am unsure why the loss is binary crossentropy as opposed to something like mean squared error.

I would love clarification on this to help me move forward! Thank you!

Best Answer

You are correct that MSE is often used as a loss in these situations. However, the Keras tutorial (and actually many guides that work with MNIST datasets) normalizes all image inputs to the range [0, 1]. This occurs on the following two lines:

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.

Note: as grayscale images, each pixel takes on an intensity between 0 and 255 inclusive.

Therefore, BCE loss is an appropriate function to use in this case. Similarly, a sigmoid activation, which squishes the inputs to values between 0 and 1, is also appropriate. You'll notice that under these conditions, when the decoded image is "close" to the encoded image, BCE loss will be small. I found more information about this here.