Solved – Transfer Learning on Autoencoders

anomaly detectionmachine learningneural networkstransfer learning

I want to use the encoder of my autoencoder for feature extraction in an image anomaly detection framework.

For that reason, I thought that pretraining the autoencoder on a large dataset and then fine-tuning it on my target dataset would be a good idea. This idea crossed my mind because many anomaly detection approaches use CNN architectures like VGG, ResNet etc. as a feature extractor, which are pretrained on ImageNet.

I did not find papers regarding this matter and therefore my question if transfer learning is even really used on autoencoders?

Best Answer

This approach often used in NLP models, so called via tuning and it is what you called transfer learning. A recent debate on this so called Foundation models actually revolves around this concept.

transfer learning is even really used on autoencoders?

Yes, it is possible, as these new foundation models approach training a large vision models or autoencoders. However, VGG or Resnet are very tiny models to be used as a foundational model compare to language models like GPT-3 or similar in size and training data coverage.

Related Solutions

Solved – the architecture of a stacked convolutional autoencoder

I am currently exploring stacked-convolutional autoencoders.

I will try and answer some of your questions to the best of my knowledge. Mind you, I might be wrong so take it with a grain of salt.

Yes, you have to "reverse" pool and then convolve with a set of filters to recover your output image. A standard neural network (considering MNIST data as input, i.e. 28x28 input dimensions) would be:

    28x28(input) -- convolve with 5 filters, each filter 5x5 -->  5 @ 28 x 28 maps -- maxPooling --> 5 @ 14 x 14 (Hidden layer) -- reverse-maxPool --> 5 @ 28 x 28 -- convolve with 5 filters, each filter 5x5 --> 28x28 (output)

My understanding is that conventionally that is what one should do, i.e. train each layer separately. After that you stack the layers and train the entire network once more using the pre-trained weights. However, Yohsua Bengio has some research (the reference escapes my memory) showcasing that one could construct a fully-stacked network and train from scratch.
My understanding is that "noise layer" is there to introduce robustness/variability in the input so that the training does not overfit.
As long as you are still "training" pre-training or fine-tuning, I think the reconstruction part (i.e. reversePooling, de-convolution etc) is necesary. Otherwise how should one perform error-back-propagation to tune weights?
I have tried browsing through numerous papers, but the architecture is never explained in full. If you find any please do let me know.

Solved – ny method for choosing the number of layers and neurons

There is no direct way to find the optimal number of them: people empirically try and see (e.g., using cross-validation). The most common search techniques are random, manual, and grid searches.

There exist more advanced techniques such as

1) Gaussian processes. Example:

Franck Dernoncourt, Ji Young Lee Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification, IEEE SLT 2016.

2) Neuro-evolution. Examples:

Zaremba, Wojciech. Ilya Sutskever. Rafal Jozefowicz "An empirical exploration of recurrent network architectures." (2015): used evolutionary computation to find optimal RNN structures.
Franck Dernoncourt. "The medial Reticular Formation: a neural substrate for action selection? An evaluation via evolutionary computation.". Master's Thesis. École Normale Supérieure Ulm. 2011. Used evolutionary computation to find connections in the ANN.
Bayer, Justin, Daan Wierstra, Julian Togelius, and Jürgen Schmidhuber. "Evolving memory cell structures for sequence learning." In International Conference on Artificial Neural Networks, pp. 755-764. Springer Berlin Heidelberg, 2009.: used evolutionary computation to find optimal RNN structures.

Best Answer

Related Solutions

Solved – the architecture of a stacked convolutional autoencoder

Solved – ny method for choosing the number of layers and neurons

Related Question