Solved – Does it make sense to use PCA when the determinant of the correlation matrix is (almost) zero

I'm running a PCA over a data set of $N \times p$ size ($N\approx 1000$ being the number of measurements and $p\approx 200$ being the number of dimensions/predictors).

I expect many of the predictors to be correlated and that the dimensions can consequently be reduced. I can even drop some columns that are linearly dependent with respect to the others.

When I run the PCA I find that $\sim 50\%$ of the variance can be explained by the first 5 PCs, suggesting that the predictors can actually be grouped.

But I am concerned about the smallness of the correlation matrix ($R$) determinant, which is $\det(R) \approx 10^{-100}$ or a ridiculous number like that.

Do the results make sense with such a small number?

Moreover, I see that the PCA results change (a lot!) if I round the input numbers to drop non-relevant digits, like the 10th digit or so. I think this is linked with the fact we are working with such a small determinant.

Since a small determinant in R indicates that there are redundant dimensions, I would say that the PCA is the way to go to reduce them. Nevertheless, does it make sense to run a PCA with such a small determinant? If not, what is the best way to reduce the dimensionality of the problem?

Best Answer

Having a very small $ \det(R) $ only means that you have some variables that are almost linearly dependent. Note that $\det(R)$ equals the product of the eigenvalues of $R$; so there is at least one eigenvalue that is approximately zero.

This only means that you have some extra/redundant dimensions in your dataset and that PCA will actually be able to represent 100% of the information with a smaller ($p_\text{new} \le p - 1$) set of dimensions.

Best Answer

Related Solutions

Solved – PCA when the dimensionality is greater than the number of samples

Solved – When is it appropriate to use PCA as a preprocessing step

Related Question