Solved – PCA basic: Must eigenvalues converge to zero at high dimension

dimensionality reductioneigenvaluesextrapolationpca

Recently, I obtained several PCA plots, and because I am unable to produce eigenvalues for higher dimensions, I tried to extrapolate them based on the available data. The reason why I want to do this is to check whether the PC1 explains sufficient proportion of the total variance.

For an analysis up to the 25th dimension (i.e. up to PC25), PC1 is able to explain 40.6% of total variance. I extrapolated eigenvalues beyond the 25th dimension using an exponential line-of-best-fit based on the available 25 eigenvalues. Here is the plot when I tried to extrapolate to the 50th eigenvalue:

Extrapolation up to the 50th eigenvalue

Line-of-best-fit equation: y = 0.7415705 + (64097570 – 0.7415705)/(1 + (x/0.000446242)^1.708334)

From the plot, it seems that the line-of-best-fit is a good model. However, the biggest problem is that the extrapolated eigenvalues never converge to zero at extremely high principal component. This means that if I included up to 50th eigenvalue, the PC1 now explains only 37.5% of the total variance. If extrapolating up to the 1000th eigenvalue, the PC1 explains only a mere amount of 12.2%.

Does this procedure seem reasonable to you? Because of the nature of extrapolation (never converging to zero), the higher eigenvalue I go, the significantly less my PC1 explains the total variance.

Basically, my questions are: Must eigenvalues converge to zero at higher dimensions? Or is it possible for eigenvalues to converge at a finite value?

Best Answer

Recall two facts:

  • Number of eigenvalues is equal to rank of matrix.
  • Sum of eigenvalues of matrix is equal to it's trace (sum of entries on diagonal).

In PCA "matrix" mentioned above is correlation or covariance matrix of your data set, so (provided none of your variables is linear combination of others):

  • Number of eigenvalues is equal to number of variables
  • Sum of eigenvalues is equal to number of variables (if you use correlation matrix) or to sum of their variances (if you use covariance matrix).

This means that extrapolation you've made suffers from two things:

  • You extrapolate "too far away" (do you have 1000 variables to extrapolate to 1000th eigenvalue?)
  • You ignore the fact that sum of eigenvalues is well known.

To answer your question: technically eigenvalues do not "converge" to anything, because we have finite number of them.

Related Question