Solved – the difference between denoising autoencoder and contractive autoencoder

autoencodersdeep learninggraphical-modelmachine learning

Denoising Autoencoders (DAE) works by inducing some noise in the input vector and then transforming it into the hidden layer, while trying to reconstruct the original vector. However, I fail to understand the intuition of Contractive Autoencoders (CAE).

Does the CAE say that the hidden layer will only learn the features that differentiate one input from the rest of the inputs in a vector? If so then how would we reconstruct the original vector?

Best Answer

No. The CAE tries to make the encoder (i.e. mapping from input to hidden layer) have the property of locality, i.e. small changes in input lead to small changes at hidden layer. This is a nice property because it means the mapping is not too sensitive, which should help it generalise beyond the training data.

(There is an extra complication: the CAE tries in particular to enforce locality along the direction of the low-dimensional manifold which all autoencoders assume is present in the input data.

My understanding of this is really based on the original paper (Rifai et al.). This video explains the directionality a bit differently.

I find this part a bit harder to explain. I suggest to concentrate on the locality in the first place.)

Related Solutions

Solved – the architecture of a stacked convolutional autoencoder

I am currently exploring stacked-convolutional autoencoders.

I will try and answer some of your questions to the best of my knowledge. Mind you, I might be wrong so take it with a grain of salt.

Yes, you have to "reverse" pool and then convolve with a set of filters to recover your output image. A standard neural network (considering MNIST data as input, i.e. 28x28 input dimensions) would be:

    28x28(input) -- convolve with 5 filters, each filter 5x5 -->  5 @ 28 x 28 maps -- maxPooling --> 5 @ 14 x 14 (Hidden layer) -- reverse-maxPool --> 5 @ 28 x 28 -- convolve with 5 filters, each filter 5x5 --> 28x28 (output)

My understanding is that conventionally that is what one should do, i.e. train each layer separately. After that you stack the layers and train the entire network once more using the pre-trained weights. However, Yohsua Bengio has some research (the reference escapes my memory) showcasing that one could construct a fully-stacked network and train from scratch.
My understanding is that "noise layer" is there to introduce robustness/variability in the input so that the training does not overfit.
As long as you are still "training" pre-training or fine-tuning, I think the reconstruction part (i.e. reversePooling, de-convolution etc) is necesary. Otherwise how should one perform error-back-propagation to tune weights?
I have tried browsing through numerous papers, but the architecture is never explained in full. If you find any please do let me know.

Solved – Activation functions for autoencoder performing regression

Since the activation is applied not directly on the input layer, but after the first linear transformation -- that is, $\text{relu}(Wx)$ instead of $W\cdot \text{relu}(x)$, relu will give you the nonlinearities you want.

And it makes sense for the final activation to be relu too in this case, because you are autoencoding strictly positive values.

Scaling and normalization is still important, because the initialization of neural network weights is carefully chosen so that for reasonably scaled inputs, the optimization process is greatly eased. A simple scaling of the inputs to around [0,1] should do the trick.

Best Answer

Related Solutions

Solved – the architecture of a stacked convolutional autoencoder

Solved – Activation functions for autoencoder performing regression

Related Question