Solved – Activation functions for autoencoder performing regression

autoencodersmachine learningneural networks

I want to train both a single-layer autoencoder and a multi-layer autoencdoer in Keras to reconstruct an input with 24 features, all in the same scale with int values from 0 to ~200000. My question is: what would be the best choice for activation function for each layer for both autoencoders?

In the Keras autoencoder blog post, Relu is used for the hidden layer and sigmoid for the output layer. But using Relu on my input would be the same as using a linear function, which would just approximate PCA. So what would be a better choice to learn non linear features?

Same goes for the multi-layer autoencoder. In the blog post only Relu is used for the hidden layers and sigmoid for the output layer. Do better options for my training data exist?

I plan to use MSE as loss function. Also, Is it necessary to scale/normalize my input data given that all features are already in the same range? And in that case, which scaling would be best?

Best Answer

Since the activation is applied not directly on the input layer, but after the first linear transformation -- that is, $\text{relu}(Wx)$ instead of $W\cdot \text{relu}(x)$, relu will give you the nonlinearities you want.

And it makes sense for the final activation to be relu too in this case, because you are autoencoding strictly positive values.

Scaling and normalization is still important, because the initialization of neural network weights is carefully chosen so that for reasonably scaled inputs, the optimization process is greatly eased. A simple scaling of the inputs to around [0,1] should do the trick.