Solved – Why do we need PCA whitening before feeding into autoencoder

autoencodersdimensionality reductionneural networkspca

In the UFLDL tutorial, we saw that autoencoder can not compress data with uncorrelated random variables.
'If the input were completely random—say, each variable comes from an IID Gaussian independent of the other features—then this compression task would be very difficult.'

But, it is suggested to apply PCA whitening before feeding the data into autoencoder. Using PCA, the data can be represented using new orthogonal variables, which are uncorrelated.

Can the autoencoder compress the data using hidden layer after PCA preprocessing?

Best Answer

Natural images have a lot of variance/energy in low spatial frequency components and little variance/energy in high spatial frequency components*. When using squared Euclidean distance to evaluate the reconstruction of an autoencoder, this means that the network will focus on getting the low spatial frequencies right, since the error scales with the variance of the signal. Whitening normalizes the variances so that the network gets punished equally for errors in low and high spatial frequencies.

Whitening effectively changes the objective function. Let $C$ be the covariance of the inputs and $W$ be PCA whitening, \begin{align} C &= QDQ^\top, & W &= D^{-\frac{1}{2}}Q^\top. \end{align} Further, let $x$ be some input, $\hat x$ be the output of the autoencoder and $y = Wx$ be the whitened signal. Then \begin{align} ||y - \hat y||_2^2 &= ||W x - W \hat x||_2^2 \\ &= (Wx - W\hat x)^\top (Wx - W\hat x) \\ &= (x - \hat x)^\top W^\top W (x - \hat x) \\ &= (x - \hat x)^\top C^{-1} (x - \hat x) \\ &= ||x - \hat x||_{C^{-1}}^2. \end{align} That is, by optimizing $\hat y$ instead of $\hat x$, we are effectively optimizing a particular Mahalanobis distance instead of standard Euclidean distance.

*see http://tdlc.ucsd.edu/images/facescheung.jpg for an illustrative example of spatial frequencies

Related Question