Solved – Why would I ever use a linear autoencoder for dimensionality reduction

autoencodersdimensionality reductionpca

Following on from: What're the differences between PCA and autoencoder?

If I want to do dimensionality reduction and restrict myself to using a linear activation for my autoencoder, is there any reason to use an AE over Principal Component Analysis? PCA minimises the squared distance between our data restricted in some subspace and the true representation and has a closed form solution so is very fast – an AE really goes round the houses to achieve this and the solution will have no better loss i.e. the result of the loss function being optimised could be at best the same as for PCA.

So I conclude that I should only ever use an autoencoder for dimensionality reduction if I want to incorporate a non-linear activation function, or use multiple layers, or use a convolutional layer etc. Is this a fair conclusion?

Best Answer

Using a linear autoencoder instead of PCA could also be useful in a large-scale learning scenario. Since you can use Stochastic Gradient Descent (SGD) to train the AE, there is no neeed to load all the training samples in the main memory at once, which can be problematic with large-scale problems.

The linear AE may also come handy in online-learning scenarios, where the training examples arrive over time, as this could be easily handled with SGD.

Another option would be using an Incremental version of PCA (e.g., that of Scikit-learn): http://scikit-learn.org/stable/auto_examples/decomposition/plot_incremental_pca.html