Solved – Relation between variance of eigenvalues and the effectiveness of PCA on the data

eigenvaluesmachine learningpcavariance

If the covariance matrix has eigenvalues $$\lambda_1 \ge \lambda_2 \ldots \ge \lambda_d > 0,$$ why is the variance of the eigenvalues, $$\sigma^2=\frac{1}{d}\sum_{i=1}^d (\lambda_i-\bar \lambda)^2$$ a measure of whether or not PCA would be useful for analyzing the data (the higher the value of $\sigma$, the more useful PCA)?

Best Answer

As @ttnphns said, if eigenvalues are similar between them, the covariance matrix of your multivariate vector seems espherical.

A little bit previous to the point of the answer: suppose for a moment that all eigenvalues are different.

Then, $\lambda_1$ is associated with the 1-dimentional affine sub-space in which the projection of the data shows the higher variance (it is usually said that it is the direction that "explains" most of the variance).

Then, $\lambda_2$ is asociated with the direction orthogonal with $\lambda_1$ that "explains" most of the variance between all directions orthogonal to $\lambda_1$ (here, explain is in the same sense of above, that is, projecting the data into this subespace shows higher variance than any other projection, with the restriction of being orthogenal to the one asociated to $\lambda_1$).

And with $\lambda_3, \lambda_4, ... , \lambda_d$ you could keep finding directions orthogonal to the previous ones that explain variance. It comes from this construction that the first direction explains most of the variance of the data, the second one explains less than the first but more than the rest and so on... So, sometimes, the firsts $p < d$ principal components (i.e. data projections on the previous directions) are used to describe the data, as they keep most of their variability while reducing the dimentionality of the dataset.

And here is a big point: all of this holds only if the eigenvalues are different between each other. If $\lambda_1 = \lambda_2$, then you cannot find a single 1-d subespace that explain most of the variance, but you need a 2-d plane to do that (associated with the eigenvectors of $\lambda_1$ and $\lambda_2$). You cannot (essentially) say anything about the directions associated with each of them by themselves, but you can assure that the plane formed by these two is the plane that explains the higher percentage of the variance between all of the available 2-d planes. So data reduction will,at most, be possible until 2-d, no 1-d representation makes sense.

In the extreme case in which all eigenvalues are the same, the data cannot be projected optimally (in the sense of variance explanation understood as above), and no data reduction can be performed. If eigenvalues variance is small, then something like this last case is happening: eigenvalues are very similar to each other, and so succesive directions will explain almost the same percentage of the variance (say near $\frac{1}{d}$ in an extreme, almost equal, case). So PCA would not be useful as a dimention reduction technique, because in order to get a representation of the data that keeps track of a significant amount of the original variability, you will need to use near d dimentions - so no reduction really took place. For example, if $d=8$ and eigenvalues are almost the same (i.e., their variance is near zero) then if you use 3 PC you will only explain around $\frac{3}{8} = 37.5\%$ of your original data variance: no PC low dimentionality representation keeps track of your data variability, so it may hide a lot of its beahviour, and should not be used.

Related Question