Factor Analysis/PCA Rotations – Intuitive Reasons and Selection Guide

dimensionality reductionfactor analysisfactor-rotationinterpretationpca

My Questions

What is the intuitive reason behind doing rotations of factors in factor analysis (or components in PCA)?

My understanding is, if variables are almost equally loaded in the top components (or factors) then obviously it is difficult to differentiate the components. So in this case one could use rotation to get better differentiation of components. Is this correct?
What are the consequence of doing rotations? What things does this affect?
How to select appropriate rotation?
There are orthogonal rotations & oblique rotations. How to choose between these and what are the implications of this choice?

Please explain intuitively with least mathematical equations. Few of the spread out answers were math heavy but I am looking more for intuitive reasons and rules of thumb.

Best Answer

Reason for rotation. Rotations are done for the sake of interpretation of the extracted factors in factor analysis (or components in PCA, if you venture to use PCA as a factor analytic technique). You are right when you describe your understanding. Rotation is done in the pursuit of some structure of the loading matrix, which may be called simple structure. It is when different factors tend to load different variables $^1$. [I believe it is more correct to say that "a factor loads a variable" than "a variable loads a factor", because it is the factor that is "in" or "behind" variables to make them correlate, but you may say as you like.] In a sense, typical simple structure is where "clusters" of correlated variables show up. You then interpret a factor as the meaning which lies on the intersection of the meaning of the variables which are loaded enough by the factor; thus, to receive different meaning, factors should load variables differentially. A rule of thumb is that a factor should load decently at least 3 variables.
Consequences. Rotation does not change the position of variables relative to each other in the space of the factors, i.e. correlations between variables are being preserved. What are changed are the coordinates of the variable vectors' end-points onto the factor axes - the loadings (please search this site for "loading plot" and "biplot", for more)$^2$. After an orthogonal rotation of the loading matrix, factor variances get changed, but factors remain uncorrelated and variable communalities are preserved.

In an oblique rotation factors are allowed to lose their uncorrelatedness if that will produce a clearer "simple structure". However, interpretation of correlated factors is a more difficult art because you have to derive meaning from one factor so that it does not contaminate the meaning of another one that it correlates with. That implies that you have to interpret factors, let us say, in parallel, and not one by one. Oblique rotation leaves you with two matrices of loadings instead of one: pattern matrix $\bf P$ and structure matrix $\bf S$. ($\bf S=PC$, where $\bf C$ is the matrix of correlations between the factors; $\bf C=Q'Q$, where $\bf Q$ is the matrix of oblique rotation: $\bf S=AQ$, where $\bf A$ was the loading matrix prior any rotation.) The pattern matrix is the matrix of regressional weights by which factors predict variables, while the structure matrix is the correlations (or covariances) between factors and variables. Most of the time we interpret factors by pattern loadings because these coefficients represent the unique individual investment of the factor in a variable. Oblique rotation preserves variable communalities, but the communalities are no longer equal to the row sums of squares in $\bf P$ or in $\bf S$. Moreover, because factors correlate, their variances partly superimpose$^3$.

Both orthogonal and oblique rotations, of course, affect factor/component scores which you might want to compute (please search "factor scores" on this site). Rotation, in effect, gives you other factors than those factors you had just after the extraction$^4$. They inherit their predictive power (for the variables and their correlations) but they will get different substantial meaning from you. After rotation, you may not say "this factor is more important than that one" because they were rotated vis-a-vis each other (to be honest, in FA, unlike PCA, you may hardly say it even after the extraction because factors are modelled as already "important").

Choice. There are many forms of orthogonal and oblique rotations. Why? First, because the concept of "simple structure" is not univocal and can be formulated somewhat differently. For example, varimax - the most popular orthogonal method - tries to maximize variance among the squared values of loadings of each factor; the sometimes used orthogonal method quartimax minimizes the number of factors needed to explain a variable, and often produces the so called "general factor". Second, different rotations aim at different side objectives apart from simple structure. I won't go into the details of these complex topics, but you might want to read about them for yourself.

Should one prefer orthogonal or oblique rotation? Well, orthogonal factors are easier to interpret and the whole factor model is statistically simpler (orthogonal predictors, of course). But there you impose orthogonality on the latent traits you want to discover; are you sure they should be uncorrelated in the field you study? What if they are not? Oblique rotation methods$^5$ (albeit each having their own inclinations) allow, but don't force, factors to correlate, and are thus less restrictive. If oblique rotation shows that factors are only weakly correlated, you may be confident that "in reality" it is so, and then you may turn to orthogonal rotation with good conscience. If factors, on the other hand, are very much correlated, it looks unnatural (for conceptually distinct latent traits, especially if you are developing an inventory in psychology or such, - recall that a factor is itself a univariate trait, not a batch of phenomena), and you might want to extract less factors, or alternatively to use the oblique results as the batch source to extract the so-called second-order factors.

$^1$ Thurstone brought forward five ideal conditions of simple structure. The three most important are: (1) each variable must have at least one near-zero loading; (2) each factor must have near-zero loadings for at least m variables (m is the number of factors); (3) for each pair of factors, there are at least m variables with loadings near zero for one of them, and far enough from zero for the other. Consequently, for each pair of factors their loading plot should ideally look something like:

enter image description here

This is for purely exploratory FA, while if you are doing and redoing FA to develop a questionnaire, you eventually will want to drop all points except blue ones, provided you have only two factors. If there are more than two factors, you will want the red points to become blue for some of the other factors' loading plots.

$^2$

enter image description here

$^3$ The variance of a factor (or component) is the sum of its squared structure loadings $\bf S$, since they are covariances/correlations between variables and (unit-scaled) factors. After oblique rotation, factors can get correlated, and so their variances intersect. Consequently, the sum of their variances, SS in $\bf S$, exceeds the overall communality explained, SS in $\bf A$. If you want to reckon after factor i only the unique "clean" portion of its variance, multiply the variance by $1-R_i^2$ of the factor's dependence on the other factors, the quantity known as anti-image. It is the reciprocal of the i-th diagonal element of $\bf C^{-1}$. The sum of the "clean" portions of the variances will be less than the overall communality explained.

$^4$ You may not say "the 1st factor/component changed in rotation in this or that way" because the 1st factor/component in the rotated loading matrix is a different factor/component than the 1st one in the unrotated loading matrix. The same ordinal number ("1st") is misleading.

$^5$ Two most important oblique methods are promax and oblimin. Promax is the oblique enhancement of varimax: the varimax-based structure is then loosed in order to meet "simple structure" to a greater degree. It is often used in confirmatory FA. Oblimin is very flexible due to its parameter gamma which, when set to 0, makes oblimin the quartimin method yielding most oblique solutions. A gamma of 1 yields the least oblique solutions, the covarimin, which is yet another varimax-based oblique method alternative to promax. All oblique methods can be direct (=primary) and indirect (=secondary) versions - see the literature. All rotations, both orthogonal and oblique, can be done with Kaiser normalization (usually) or without it. The normalization makes all variables equally important at the rotation.

Some threads for further reading:

Can there be reason not to rotate factors at all? (Check this too.)

Which matrix to interpret after oblique rotation - pattern or structure?

What do the names of factor rotation techniques (varimax, etc.) mean? (A detailed answer with formulae and peseudocode for the orthogonal factor rotation methods)

Is PCA with components rotated still PCA or is a factor analysis?

Best Answer

Related Solutions

Exploratory Factor Analysis – Reasons to Leave an EFA Solution Unrotated

Initial Eigenvalues and Sums of Squared Loadings in Factor Analysis – Relationship Explained

Related Question