Recently, I obtained several PCA plots, and because I am unable to produce eigenvalues for higher dimensions, I tried to extrapolate them based on the available data. The reason why I want to do this is to check whether the PC1 explains sufficient proportion of the total variance.
For an analysis up to the 25th dimension (i.e. up to PC25), PC1 is able to explain 40.6% of total variance. I extrapolated eigenvalues beyond the 25th dimension using an exponential line-of-best-fit based on the available 25 eigenvalues. Here is the plot when I tried to extrapolate to the 50th eigenvalue:
Line-of-best-fit equation: y = 0.7415705 + (64097570 – 0.7415705)/(1 + (x/0.000446242)^1.708334)
From the plot, it seems that the line-of-best-fit is a good model. However, the biggest problem is that the extrapolated eigenvalues never converge to zero at extremely high principal component. This means that if I included up to 50th eigenvalue, the PC1 now explains only 37.5% of the total variance. If extrapolating up to the 1000th eigenvalue, the PC1 explains only a mere amount of 12.2%.
Does this procedure seem reasonable to you? Because of the nature of extrapolation (never converging to zero), the higher eigenvalue I go, the significantly less my PC1 explains the total variance.
Basically, my questions are: Must eigenvalues converge to zero at higher dimensions? Or is it possible for eigenvalues to converge at a finite value?
Best Answer
Recall two facts:
In PCA "matrix" mentioned above is correlation or covariance matrix of your data set, so (provided none of your variables is linear combination of others):
This means that extrapolation you've made suffers from two things:
To answer your question: technically eigenvalues do not "converge" to anything, because we have finite number of them.