Hinton and Salakhutdinov in Reducing the Dimensionality of Data with Neural Networks, Science 2006 proposed a non-linear PCA through the use of a deep autoencoder. I have tried to build and train a PCA autoencoder with Tensorflow several times but I have never been able to obtain better result than linear PCA.
How can I efficiently train an autoencoder?
(Later edit by @amoeba: the original version of this question contained Python Tensorflow code that did not work correctly. One can find it in the edit history.)
Best Answer
Here is the key figure from the 2006 Science paper by Hinton and Salakhutdinov:
It shows dimensionality reduction of the MNIST dataset ($28\times 28$ black and white images of single digits) from the original 784 dimensions to two.
Let's try to reproduce it. I will not be using Tensorflow directly, because it's much easier to use Keras (a higher-level library running on top of Tensorflow) for simple deep learning tasks like this. H&S used $$784\to 1000\to 500\to 250\to 2\to 250\to 500\to 1000\to 784$$ architecture with logistic units, pre-trained with the stack of Restricted Boltzmann Machines. Ten years later, this sounds very old-school. I will use a simpler $$784\to 512\to 128\to 2\to 128\to 512\to 784$$ architecture with exponential linear units without any pre-training. I will use Adam optimizer (a particular implementation of adaptive stochastic gradient descent with momentum).
The code is copy-pasted from a Jupyter notebook. In Python 3.6 you need to install matplotlib (for pylab), NumPy, seaborn, TensorFlow and Keras. When running in Python shell, you may need to add
plt.show()
to show the plots.Initialization
PCA
This outputs:
Training the autoencoder
This takes ~35 sec on my work desktop and outputs:
so you can already see that we surpassed PCA loss after only two training epochs.
(By the way, it is instructive to change all activation functions to
activation='linear'
and to observe how the loss converges precisely to the PCA loss. That is because linear autoencoder is equivalent to PCA.)Plotting PCA projection side-by-side with the bottleneck representation
Reconstructions
And now let's look at the reconstructions (first row - original images, second row - PCA, third row - autoencoder):
One can obtain much better results with deeper network, some regularization, and longer training. Experiment. Deep learning is easy!