Solved – What does “PCA (Principal Component Analysis) spheres the data” mean

pca

I was reading some notes and it says that PCA can "sphere the data". What they define to me as "sphering the data" is dividing each dimension by the square root of the corresponding eigenvalue.

I am assuming that by "dimension" they mean each basis vector into which we are projecting (i.e. the eigenvectors we are projecting to). Thus I guess they are doing:

$$ u^{'}_i= \frac{u_i}{\sqrt{eigenValue(u_i)}}$$

where $u_i$ is one of the eigenvectors (i.e. one of the principal components). Then with that new vector, I am assuming they are projecting the raw data we have, say $x^{(i)}$ to $z^{(i)}$. So the projected points would now be:

$$ z'^{(i)} = u^{'}_i \cdot x^{(i)}$$

They claim that doing this ensures that all the features have the same variance.

However, I am not even sure if my interpretation of what they mean by sphering is correct and wanted to check if it was. Also, even if it was correct, what is the point of doing something like this? I know they claim it makes sure everyone has the same variance but, why would we want to do this and how does it achieve this?

Best Answer

Your understanding is right. Have a look at this figure which represents various possibilities of your data points: http://shapeofdata.files.wordpress.com/2013/02/pca22.png

They look ellipsoidal. If you do what you've described above i.e. compress the points in the direction in which they are spread the most (approx the 45 degree line in the image), the points will be lying in a circle (sphere in higher dimensions).

One reason you spherify the data is while doing prediction and understanding which coordinates are important. Say you wish to predict $y$ using $x_1$ and $x_2$, and you get coefficient values $\beta_1$ and $\beta_2$ i.e. $y\sim \beta_1 x_1+\beta_2x_2 $. Now if $x_1$ and $x_2$ have the same variance, i.e. they are roughly distributed spherically, and you find that $\beta_1=1$ while $\beta_2=10$, you can interpret this has saying that $x_2$ influences $y$ more than $x_1$. If their scales were not the same however, and $x_1$ was distributed 10 times more than $x_2$, then you would get the above values of $\beta_1$ and $\beta_2$ even if they both influenced $y$ roughly the same. To summarize, you "spherify" or "normalize" to make inferences about the variable's importance from its coefficient.

Related Question