[Math] Why the principal components correspond to the eigenvalues

linear algebramatricesprobabilitystatistics

Suppose ${\bf{X}} = ({X_1},{X_2},\ldots,{X_n})$ are the original components (also random variables) and ${{\bf{w}}_j} = ({\omega _1},{\omega _2},\ldots,{\omega _n})$ are loadings for the $j$th principal component satisfying ${\bf{w}}_j^\rm{T}{{\bf{w}}_j} = 1$ and ${\bf{w}}_\rm{i}^\rm{T}{\bf{w}}_j = 0$ for $i\neq j$, thus ${z_j} = {\bf{w }}_j^{\rm{T}}{\bf{X}}$ is the $j$th component.

To find out the first principal component, we try to maximize the variance of $z_1$, which is $\rm{var}(z_1)=\rm{var}({\bf{w }}_1^{\rm{T}}{\bf{X}})=\bf{w}_\rm{1}^\rm{T}\rm{var}(\bf{X})\bf{w}_\rm{1}$. We estimate $\rm{var}(\bf{X}\rm{)}$ by the sample co-variance matrix $\bf{S}$, we maximize $L=\bf{w}_\rm{1}^\rm{T}\bf{S}\bf{w}_\rm{1}-\lambda({\bf{w}}_1^\rm{T}{{\bf{w}}_1} – 1)$ where $\lambda$ is the Lagrange multiplier. By taking derivative we arrive at $(\bf{S}-\lambda\bf{I})\bf{w}_\rm{1}=0$. It is obvious $\bf{w}_1$ is an eigenvector of the sample co-variance matrix $\bf{S}$.

Now the problem comes. Solving the equation gets you all eignenvalues and eigenvectors. I searched the internet all materials I found simply tell you to rank the eigenvalues and the eigenvector of the largest eigenvalue is the first principal component, and the eigenvector of the second eigenvalue is the second principal component, and so one so forth.

My question is how do we show or prove the largest eigenvalue corresponds to the largest variance and the second largest eigenvalue corresponds to the second largest variance and so on. Thank you.

Best Answer

Ok. I now feel this question is a little dumb. I finally know why. I hope my answer will be helpful for someone else.

The reason is so simple! Since we are trying to maximize ${{\bf{w}}^{\rm{T}}}{\bf{Sw}}$ and now we know the candidates are eigenvectors, just plug them back in, then we have ${{\bf{w}}^{\rm{T}}}{\bf{Sw}} = {{\bf{w}}^{\rm{T}}}\lambda {\bf{w}} = \lambda {{\bf{w}}^{\rm{T}}}{\bf{w}} = \lambda $, that means the eigenvalues are nothing but the variances. That's why we rank by eigenvalues!

It is so simple, but it really took me an entire day to figure out...

Related Question