Solved – Using PCA in Matlab: Is it based on the covariance or correlation matrix

eigenvaluesMATLABpca

I want to produce a scree plot to assess if there is an 'elbow' in the eigenvalues to aid in my identification of the number of PCs to retain.
However, upon reading further into the topic, I realised that the eigenvalues are only plotted when the correlation matrix is used and that the log of the eigenvalues is required if the PCA used the covariance matrix.

I'm not entirely clear on the difference between these two, but I used 'pca' in matlab to carry out my analysis and it says on the documentation that the 'latent' output (i.e. the eigenvalues) are 'the eigenvalues of the covariance matrix of X' (X is the data).

I normalised my data using zscore prior to executing pca. Does that make a difference?
My ultimate question is: can I use the eigenvalues in my scree plot, or do I have to get the log of them to plot?

Best Answer

In regards to the question in the title: The function pca in MATLAB uses the SVD of the centred dataset to perform PCA; this excellent thread elucidates the relation between the two. Using the SVD corresponds to using the covariance matrix, not the correlation matrix.

Having said that and to answer the main question of post: if one z-scores the data and then uses the covariance matrix for PCA, the results will be equivalent to using the correlation matrix of the original data. This can be easily seen by computing the difference: cov(zscore(A)) - corr(A) which should be zero to numerical precision (where Ais the dataset matrix).

So yes, there will be a difference if you use the correlation-based instead of the covariance-based PCA methodology; if you $z$-score your dataset though the two methodologies will give equal results. In general, I would recommend you $z$-scale your variables before doing PCA, especially if they are measured in different scales. Otherwise the differences in their magnitudes can potentially dominate the subsequent eigenanalysis (and the interpretation of the final results). This topic is explored in more detail in this thread on doing PCA on correlation or covariance?

Related Question