Standardized (to unit variance) principal components after an orthogonal rotation, such as varimax, are simply rotated standardized principal components (by "principal component" I mean PC scores). In linear regression, scaling of individual predictors has no effect and replacing predictors by their linear combinations (e.g. via a rotation) has no effect either. This means that using any of the following in a regression:
- "raw" principal components (projections on the cov. matrix eigenvectors),
- standardized principal components,
- rotated [standardized] principal components,
- arbitrarily scaled rotated [standardized] principal components,
would lead to exactly the same regression model with identical $R^2$, predictive power, etc. (Individual regression coefficients will of course depend on the normalization and rotation choice.)
The total variance captured by the raw and by the rotated PCs is the same.
This answers your main question. However, you should be careful with your workflows, as it is very easy to get confused and mess up the calculations. The simplest way to obtain standardized rotated PC scores is to use psych::principal
function:
psych::principal(data, rotate="varimax", nfactors=k, scores=TRUE)
Your workflow #2 can be more tricky than you think, because loadings after varimax rotation are not orthogonal, so to obtain the scores you cannot simply project the data onto the rotated loadings. See my answer here for details:
Your workflow #3 is probably also wrong, at least if you refer to the psych::fa
function. It does not do PCA; the fm="pa"
extraction method refers to "principal factor" method which is based on PCA, but is not identical to PCA (it is an iterative method). As I wrote above, you need psych::principal
to perform PCA.
See my answer in the following thread for a detailed account on PCA and varimax:
I rerun your analysis in SPSS (I don't have Stata, and I didn't rerun it in Matlab this time).
The sweet pulp of your mistaken analysis is that you somehow managed to rotate eigenvectors, whereas rotations are normaly done of loadings. Please read my recent answers about eigenvectors/loadings and about rotations.
Your first analysis extracted all 5 components. I can confirm (in SPSS) the eigenvalues and the eivenvectors you displayed. Then one would expect that you request loadings (which are the eigenvectors scaled up to the respective eigenvalues) which are:
Component
1 2 3 4 5
V1 .943 .050 -.114 -.170 -.258
V2 -.078 .975 -.205 .041 .014
V3 .920 -.007 -.151 -.289 .218
V4 .844 -.118 -.267 .449 .037
V5 .595 .226 .766 .085 .021
Then this matrix after varimax rotation will be:
Component
1 2 3 4 5
V1 .831 .247 .371 .012 .334
V2 -.014 .014 -.044 .999 .002
V3 .924 .188 .300 -.032 -.142
V4 .442 .124 .886 -.063 .027
V5 .215 .970 .107 .015 .021
Rotation Method: Varimax without Kaiser Normalization.
with the rotation transformation matrix:
1 2 3 4 5
1 .760 .387 .513 -.050 .078
2 .018 .225 -.105 .968 .021
3 -.251 .884 -.317 -.235 -.011
4 -.595 .132 .790 .066 -.005
5 .066 .025 .038 .019 -.997
You rotated the matrix of eigenvectors, not loadings. We know that the eigenvector matrix in PCA is itself a special case of orthogonal rotation matrix. Its column sums-of-squares are 1, row sums-of-squares are 1 and cross-products of the columns are 0. Such a matrix, when it is rotated orthogonally to a "simple structure" - such as by varimax method - will inevitably turn into a very simple view like the one you got in rotated components
table, with 0
and 1
values only. Each column contains only one 1
and each row contains only one 1
, but you may shuffle the exact position of the 1
s, that simple structure equivalently persists. For example SPSS varimax rotation gave me this in your place:
Component
1 2 3 4 5
V1 .000 .000 .000 1.000 .000
V2 .000 1.000 .000 .000 .000
V3 .000 .000 1.000 .000 .000
V4 1.000 .000 .000 .000 .000
V5 .000 .000 .000 .000 1.000
Rotation Method: Varimax without Kaiser Normalization.
In your second analysis you retained and rotated 3 of the total 5 components. Since you discarded two last columns in eigenvector matrix, the row SS were no longer 1 and so varimax gave you simple structure which consists of values fractional, not 0
and 1
. But the sweet pulp remains: you again rotated the wrong matrix. You ought to have rotated loading matrix, not eigenvector matrix.
Also, in most cases it is better not to switch off Kaiser normalization when doing loadings rotation.
P.S. Stata documentation clearly states it that pca
function computes and rotates only eigenvectors. It does, though, compute and rotate loadings in a special post-function:
Remark: Literature and software that treat principal components in
combination with factor analysis tend to display principal components
normed to the associated eigenvalues rather than to 1. This
normalization is available in the postestimation command estat
loadings; see [MV] pca postestimation.
Best Answer
"Rotations" is an approach developed in factor analysis; there rotations (such as e.g. varimax) are applied to loadings, not to eigenvectors of the covariance matrix. Loadings are eigenvectors scaled by the square roots of the respective eigenvalues. After the varimax rotation, the loading vectors are not orthogonal anymore (even though the rotation is called "orthogonal"), so one cannot simply compute orthogonal projections of the data onto the rotated loading directions.
@FTusell's answer assumes that varimax rotation is applied to the eigenvectors (not to loadings). This would be pretty unconventional. Please see my detailed account of PCA+varimax for details: Is PCA followed by a rotation (such as varimax) still PCA? Briefly, if we look at the SVD of the data matrix $X=USV^\top$, then to rotate the loadings means inserting $RR^\top$ for some rotation matrix $R$ as follows: $X=(UR)(R^\top SV^\top).$
If rotation is applied to loadings (as it usually is), then there are at least three easy ways to compute varimax-rotated PCs in R :
They are readily available via function
psych::principal
(demonstrating that this is indeed the standard approach). Note that it returns standardized scores, i.e. all PCs have unit variance.One can manually use
varimax
function to rotate the loadings, and then use the new rotated loadings to obtain the scores; one needs to multiple the data with the transposed pseudo-inverse of the rotated loadings (see formulas in this answer by @ttnphns). This will also yield standardized scores.One can use
varimax
function to rotate the loadings, and then use the$rotmat
rotation matrix to rotate the standardized scores obtained withprcomp
.All three methods yield the same result:
This yields three identical outputs:
Note: The
varimax
function in R usesnormalize = TRUE, eps = 1e-5
parameters by default (see documentation). One might want to change these parameters (decrease theeps
tolerance and take care of Kaiser normalization) when comparing the results to other software such as SPSS. I thank @GottfriedHelms for bringing this to my attention. [Note: these parameters work when passed to thevarimax
function, but do not work when passed to thepsych::principal
function. This appears to be a bug that will be fixed.]