Ordinary autoencoder architectures (not variational autoencoders, stacked denoising autoencoders, etc.) seem to only have three layers: the input, the hidden/code, and the output/reconstruction. Are there any examples of papers which used architectures consisting of multiple hidden layers? If not, what are the theoretical justifications for only using one hidden layer in an autoencoder?
Solved – Architecture of autoencoders
autoencoders
Best Answer
Yes, e.g. look for "deep autoencoders" a.k.a. "stacked autoencoders", such as {1}:
Hugo Larochelle has the video on it: Neural networks [7.6] : Deep learning - deep autoencoder
Geoffrey Hinton also has a video on it: Lecture 15.2 — Deep autoencoders [Neural Networks for Machine Learning]
Examples of deep autoencoders which don't make use of pretraining: http://ufldl.stanford.edu/wiki/index.php/Stacked_Autoencoders
E.g., {2} uses a stacked autoencoder with greedy layer-wise training.
Note that one can use autoencoders fancier than feedforward fully connected neural networks, e.g. {3}.
References: