[Math] Why is the eigenvector of a covariance matrix equal to a principal component

covarianceeigenvalues-eigenvectorslinear algebrastatistics

If I have a covariance matrix for a data set and I multiply it times one of it's eigenvectors. Let's say the eigenvector with the highest eigenvalue. The result is the eigenvector or a scaled version of the eigenvector.

What does this really tell me? Why is this the principal component? What property makes it a principal component? Geometrically, I understand that the principal component (eigenvector) will be sloped at the general slope of the data (loosely speaking). Again, can someone help understand why this happens?

Best Answer

Short answer: The eigenvector with the largest eigenvalue is the direction along which the data set has the maximum variance. Meditate upon this.

Long answer: Let's say you want to reduce the dimensionality of your data set, say down to just one dimension. In general, this means picking a unit vector $u$, and replacing each data point, $x_i$, with its projection along this vector, $u^T x_i$. Of course, you should choose $u$ so that you retain as much of the variation of the data points as possible: if your data points lay along a line and you picked $u$ orthogonal to that line, all the data points would project onto the same value, and you would lose almost all the information in the data set! So you would like to maximize the variance of the new data values $u^T x_i$. It's not hard to show that if the covariance matrix of the original data points $x_i$ was $\Sigma$, the variance of the new data points is just $u^T \Sigma u$. As $\Sigma$ is symmetric, the unit vector $u$ which maximizes $u^T \Sigma u$ is nothing but the eigenvector with the largest eigenvalue.

If you want to retain more than one dimension of your data set, in principle what you can do is first find the largest principal component, call it $u_1$, then subtract that out from all the data points to get a "flattened" data set that has no variance along $u_1$. Find the principal component of this flattened data set, call it $u_2$. If you stopped here, $u_1$ and $u_2$ would be a basis of the two-dimensional subspace which retains the most variance of the original data; or, you can repeat the process and get as many dimensions as you want. As it turns out, all the vectors $u_1, u_2, \ldots$ you get from this process are just the eigenvectors of $\Sigma$ in decreasing order of eigenvalue. That's why these are the principal components of the data set.