Well, @Srikant already gave you the right answer since the rotation (or loadings) matrix contains eigenvectors arranged column-wise, so that you just have to multiply (using %*%
) your vector or matrix of new data with e.g. prcomp(X)$rotation
. Be careful, however, with any extra centering or scaling parameters that were applied when computing PCA EVs.
In R, you may also find useful the predict()
function, see ?predict.prcomp
. BTW, you can check how projection of new data is implemented by simply entering:
getS3method("predict", "prcomp")
You are correct. Stata is weird about this. Stata gives different results from SAS, R and SPSS, and it is difficult (in my opinion) to understand why without delving quite deep into the world of factor analysis and PCA.
Here's how you know that something weird is happening. The sum of the squared loadings for a component are equal to the eigenvalue for that component.
Pre-and post-rotation, the eigenvalues change, but the total eigenvalues don't change. Add up the sum of the squared loadings from your output (this is why I asked you to remove the blanks in my comment). With Stata's default, the sum of squared loadings will sum to 1.00 (within rounding error). With SPSS (and R, and SAS, and every other factor analysis program I've looked at) they will sum to the eigenvalue for that factor. (Post rotation eigenvalues change, but the sum of eigenvalues stays the same). The sum of squared loadings in SPSS is equal to the sum of the eigenvalues (i.e. 3.8723 + 1.40682), both pre- and post-rotation.
In Stata, the sum of the squared loadings for each factor is equal to 1.00, and so Stata has rescaled the loadings.
The only mention of this (that I have found) in the Stata documentation is in the estat loadings section of the help, where it says:
cnorm(unit | eigen | inveigen), an option used with estat loadings,
selects the normalization of the eigenvectors, the columns of the
principal-component loading matrix. The following normalizations are
available
However, this appears to apply only to the unrotated component matrix, not the component rotated matrix. I can't get the unnormalized rotated matrix after PCA.
The people at Stata seem to know what they are doing, and usually have a good reason for doing things the way that they do. This one is beyond me though.
(For future reference, it would have made my life easier if you'd used a dataset that I could access, and if you'd included all output, without blanks).
Edit: My usual go-to site for information about how to get the same results for different programs is the UCLA IDRE. They don't cover PCA in Stata: http://www.ats.ucla.edu/stat/AnnotatedOutput/ I have to wonder if that's because they couldn't get the same result. :)
Best Answer
You get the coefficients from PCA. These coefficients are multiplied by your observation matrix to obtain the components. So, multiply rotation by the new observation matrix instead. Don't forget to center it.
Here's the code.
Run PCA and see how the score matrix is obtained from the original data and the rotation. Note, that I'm NOT centering, and you probably should.
Now, apply the same rotation to the different data (again, see that I am NOT centering).
Note the similarity of the new scores.
By the way, this is used a lot in forecasting with PCA. We obtain the rotation on historical data, then apply it to new data.