Solved – use ReLU in autoencoder as activation function

autoencodersdeep learningmachine learningneural networks

When implementing an autoencoder with neural network, most people will use sigmoid as the activation function.

Can we use ReLU instead? (Since ReLU has no limit on the upper bound, basically meaning the input image can have pixel bigger than 1, unlike the restricted criteria for autoencoder when sigmoid is used).

Best Answer

Here's a discussion thread (from July 2013) indicating that there might be some issues with it, but it can be done.

Çağlar Gülçehre (from Yoshua Bengio's lab) said he successfully used the following technique in Knowledge Matters: Importance of Prior Information for Optimization:

train the first DAE as usual, but with rectifiers in the hidden layer: a1(x) = W1 x + b1 h1 = f1(x) = rectifier(a1(x)) g1(h1) = {sigmoid}(V1 h1 + c1) minimize cross-entropy or MSE loss, comparing g1(f1(corrupt(x))) and x. the sigmoid is optional depending on the data.

train the 2nd DAE with noise added before the f1 rectifier and use softplus reconstruction units with MSE loss: h2 = f2(h1) = rectifier(W2 h1 + b2) g2(h2) = softplus(V2 h2 + c2) minimize $\lVert f_1(x) - g_2(f_2(\mathrm{rectifier}(\mathrm{corrupt}(a_1(x))))) \rVert^2 + \lambda_1 \lVert W \rVert_1 + \lambda_2 \lVert W \rVert_2$

Xavier Glorot, also from the Bengio lab, said he did the same except for replacing $\lVert W \rVert_1$ with an $L_1$ penalty "on the activation values" (presumably $\lVert g_2(\dots) \rVert_1$?) in both Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach (ICML 2011) and in Deep sparse rectifier neural networks (AISTATS 2011).