Neural Networks – Are Dimensionality Reduction Techniques Useful in Deep Learning?

dimensionality reductionneural networkspcatsne

I have been working on Machine learning and noticed that most of the time, dimensionality reduction techniques like PCA and t-SNE are used in machine learning. But, i rarely noticed anyone doing it for deep learning projects. Is there a specific reason for not using Dimensionality reduction techniques in deep learning?

Best Answer

$t$-SNE

Two obvious reasons that is not commonly used as a dimension reduction method is that it is non-deterministic and it can't be applied in a consistent fashion to test-set data. See: Are there cases where PCA is more suitable than t-SNE?

PCA

First, is not inherently a dimensionality reduction method. It's a method that makes a new matrix of the same size, represented in a decorrelated basis. Truncated PCA reduces the rank of that matrix, so it is reduced in dimension.

Second, even if you do not use PCA to reduce dimensionality, it can still be useful. In "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift", Sergey Ioffe and Christian Szegedy suggest that whitening transformation are helpful during the optimization steps.

It has been long known (LeCun et al., 1998b; Wiesler & Ney, 2011) that the network training converges faster if its inputs are whitened – i.e., linearly transformed to have zero means and unit variances, and decorrelated.

Clearly, PCA yields decorrelated vectors and subtracting the mean and rescaling by the standard deviation achieves the rest. This quotation suggests that pre-whitening the input data might give your model a nice boost in terms of training time.

Whether or not whitening is helpful for any particular model is, obviously, problem-specific. One very common deep learning application is . These networks tend not to use whitening transformations because the transformation to an orthogonal basis changes the image in a way which might not actually be useful to whatever network you're using. I'm not aware of an example where PCA improves a modern deep neural network for image classification, but that's probably due to a limitation of my knowledge; I'm sure someone will post a recent paper that uses PCA in a comment.

Moreover, truncated PCA of an image will, obviously, distort the image in some way, with the amount of distortion depending on the number of PCs that you retain.

On the other hand, a great reason to use truncated PCA for dimensionality reduction is when your data is rank-deficient. It's common for hand-crafted feature vectors, such as those used in a feed-forward network, to have a certain amount of redundancy. Presenting all of these features to your network unnecessarily increases the number of parameters, so it can be more efficient to drop them.

Common Sense

If we take a wider view of dimensionality reduction, we can still reduce the dimension of our data by using common sense.

Consider the MNIST task. The digits occupy the center of the image. If you look at the whole data set, you can find that there are some pixels around the periphery of each image which are always white. If you trim each image to exclude these always-white pixels, you've taken a significant step towards reducing how much computational power you need, since all of these pixels are now effectively "skipped over". "Always white" pixels have no useful information for the network because the pixel values are constant in all samples, so you're not losing any distinguishing information.

Related Question