Solved – Correlation between principal components

canonical-correlationcorrelationeigenvaluesmultivariate analysispca

I have two matrices a, b of dimensions (100×500), (100×15000) and I am trying to find associations between sets of variables in both matrices.

When I perform principal component analysis on matrix a, the highest loadings of the first principal component corresponds to a set of variables which contribute towards the largest proportion of variability in this dataset. These variables are of interest in my research and I would like to determine which variables in dataset b are associated with this principal component.

Therefore my question is:
If I perform principal component analysis on matrix b, can I perform correlations between the eigenvectors of a and the eigenvectors of b to determine if an association between these two datasets exists?

If such a correlation does exist, what exactly does a correlation between eigenvectors actually represent?

Best Answer

I assume each matrix $A $ and $B $ consist of random variables and observations as columns and rows or viceversa.

You can do that analysis of comparing the eigenvectors of the covariance matrices of $A $ and $B $, using the angle between them as a measure of the correlation between them. But I don't know if it is going to provide anything else that a qualitative idea. Of course this only applies if the random variables both matrices are the same, otherwise is nonsense.

Since $A $ and $B $ represent different observations from two random vectors $v_A $ and $v_B $, you may get more info from the covariance matrix of the vectors.

Related Solutions

Solved – Principal components using correlation matrix in R

You can use eigen(). For example:

> set.seed(3)
> x <- matrix(rnorm(18), ncol=3)
> x
           [,1]        [,2]       [,3]
[1,]  1.2243136 -0.48445511  0.9006247
[2,]  0.1998116 -0.74107266  0.8517704
[3,] -0.5784837  1.16061578  0.7277152
[4,] -0.9423007  1.01206712  0.7365021
[5,] -0.2037282 -0.07207847 -0.3521296
[6,] -1.6664748 -1.13678230  0.7055155

> prcomp(x)
Standard deviations:
[1] 1.0294417 0.9046837 0.4672911

Rotation:
             PC1         PC2         PC3
[1,] -0.84047203 -0.53902142  0.05534150
[2,]  0.53878561 -0.84219645 -0.02037687
[3,] -0.05759199 -0.01269102 -0.99825954

> eigen(cov(x))
$values
[1] 1.0597501 0.8184527 0.2183610

$vectors
            [,1]       [,2]        [,3]
[1,]  0.84047203 0.53902142 -0.05534150
[2,] -0.53878561 0.84219645  0.02037687
[3,]  0.05759199 0.01269102  0.99825954

So the eigenvalues of the covariance matrix are the squares of the standard deviations (i.e, variances) of the principal components and the principal components themselves are same as eigenvectors of covariance matrix (though signs may be opposite as they are here).

Solved – Very different results of principal component analysis in SPSS and Stata after rotation

You are correct. Stata is weird about this. Stata gives different results from SAS, R and SPSS, and it is difficult (in my opinion) to understand why without delving quite deep into the world of factor analysis and PCA.

Here's how you know that something weird is happening. The sum of the squared loadings for a component are equal to the eigenvalue for that component.

Pre-and post-rotation, the eigenvalues change, but the total eigenvalues don't change. Add up the sum of the squared loadings from your output (this is why I asked you to remove the blanks in my comment). With Stata's default, the sum of squared loadings will sum to 1.00 (within rounding error). With SPSS (and R, and SAS, and every other factor analysis program I've looked at) they will sum to the eigenvalue for that factor. (Post rotation eigenvalues change, but the sum of eigenvalues stays the same). The sum of squared loadings in SPSS is equal to the sum of the eigenvalues (i.e. 3.8723 + 1.40682), both pre- and post-rotation.

In Stata, the sum of the squared loadings for each factor is equal to 1.00, and so Stata has rescaled the loadings.

The only mention of this (that I have found) in the Stata documentation is in the estat loadings section of the help, where it says:

cnorm(unit | eigen | inveigen), an option used with estat loadings, selects the normalization of the eigenvectors, the columns of the principal-component loading matrix. The following normalizations are available

However, this appears to apply only to the unrotated component matrix, not the component rotated matrix. I can't get the unnormalized rotated matrix after PCA.

The people at Stata seem to know what they are doing, and usually have a good reason for doing things the way that they do. This one is beyond me though.

(For future reference, it would have made my life easier if you'd used a dataset that I could access, and if you'd included all output, without blanks).

Edit: My usual go-to site for information about how to get the same results for different programs is the UCLA IDRE. They don't cover PCA in Stata: http://www.ats.ucla.edu/stat/AnnotatedOutput/ I have to wonder if that's because they couldn't get the same result. :)

Best Answer

Related Solutions

Solved – Principal components using correlation matrix in R

Solved – Very different results of principal component analysis in SPSS and Stata after rotation

Related Question