[Math] Why are the eigenvalues of a covariance matrix equal to the variance of its eigenvectors

covarianceeigenvalues-eigenvectorslinear algebramachine learningvariance

This assertion came up in a Deep Learning course I am taking. I understand intuitively that the eigenvector with the largest eigenvalue will be the direction in which the most variance occurs. I understand why we use the covariance matrix's eigenvectors for Principal Component Analysis.

However, I do not get why the eigenvectors' variance are equal to their respective eigenvalues. I would prefer a formal proof, but an intuitive explanation may be acceptable.

(Note: this is not a duplicate of this question.)

Best Answer

Here's a formal proof: suppose that $v$ denotes a length-$1$ eigenvector of the covariance matrix, which is defined by $$ \Sigma = \Bbb E[XX^T] $$ Where $X = (X_1,X_2,\dots,X_n)$ is a column-vector of random variables with mean zero (which is to say that we've already absorbed the mean into the variable's definition). So, we have $\Sigma v = \lambda v$ (for some $\lambda \geq 0$), and $v^Tv = 1$.

Now, what do we really mean by "the variance of $v$"? $v$ is not a random variable. Really, what we mean is the variance of the associated component of $X$. That is, we're asking about the variance of $v^TX$ (the dot product of $X$ with $v$). Note that, since the $X_i$s have mean zero, so does $v^TX$. We then find $$ \Bbb E([v^TX]^2) = \Bbb E([v^TX][X^Tv]) = \Bbb E[v^T(XX^T)v] = v^T\Bbb E(XX^T) v \\ = v^T\Sigma v = v^T\lambda v = \lambda(v^Tv) = \lambda $$ and this is what we wanted to show.