Solved – correlations between factors in r- scores or loadings

correlationfactor analysisr

I've conducted a factor analysis in r with three factors (function=fa {psych};rotation=promax ; method=GLS).

 fa1 =fa(d,nfactors=3,rotate="promax",oblique.scores=TRUE,method="gls",missing=TRUE,scores=TRUE,impute="mean")

Now I would like to add a correlation matrix between the three factors. Which matrix will be better in defining the correlation between the factors? using the scores correlation?

cor(fa1$scores)

or the loadings correlation?

cor(fa1$loadings)

Best Answer

The psych package's fa function should already provide you with the matrix of correlations between factors; there's no need for you to do this yourself. This information will be located in a section of your output labeled "With factor correlations of", just below the output pertaining the the proportion of variance explained by your factor solution. Though you could examine correlations between factor scores, this would introduce the problem of rotational indeterminacy--correlations between latent factors (what fa provides automatically), as opposed to observed factor scores, do not share this limitation. In practice though, both approaches will likely produce similar estimates of factor correlations. A matrix of the correlations between factor loadings, on the other hand, would not provide you with the kind of information you are seeking about the association between your factors.

Related Solutions

Factor Analysis – Interpreting Discrepancies Between R and SPSS in Exploratory Factor Analysis

First of all, I second ttnphns recommendation to look at the solution before rotation. Factor analysis as it is implemented in SPSS is a complex procedure with several steps, comparing the result of each of these steps should help you to pinpoint the problem.

Specifically you can run

FACTOR
/VARIABLES <variables>
/MISSING PAIRWISE
/ANALYSIS <variables>
/PRINT CORRELATION
/CRITERIA FACTORS(6) ITERATE(25)
/EXTRACTION ULS
/CRITERIA ITERATE(25)
/ROTATION NOROTATE.

to see the correlation matrix SPSS is using to carry out the factor analysis. Then, in R, prepare the correlation matrix yourself by running

r <- cor(data)

Any discrepancy in the way missing values are handled should be apparent at this stage. Once you have checked that the correlation matrix is the same, you can feed it to the fa function and run your analysis again:

fa.results <- fa(r, nfactors=6, rotate="promax",
scores=TRUE, fm="pa", oblique.scores=FALSE, max.iter=25)

If you still get different results in SPSS and R, the problem is not missing values-related.

Next, you can compare the results of the factor analysis/extraction method itself.

FACTOR
/VARIABLES <variables>
/MISSING PAIRWISE
/ANALYSIS <variables>
/PRINT EXTRACTION
/FORMAT BLANK(.35)
/CRITERIA FACTORS(6) ITERATE(25)
/EXTRACTION ULS
/CRITERIA ITERATE(25)
/ROTATION NOROTATE.

and

fa.results <- fa(r, nfactors=6, rotate="none", 
scores=TRUE, fm="pa", oblique.scores=FALSE, max.iter=25)

Again, compare the factor matrices/communalities/sum of squared loadings. Here you can expect some tiny differences but certainly not of the magnitude you describe. All this would give you a clearer idea of what's going on.

Now, to answer your three questions directly:

In my experience, it's possible to obtain very similar results, sometimes after spending some time figuring out the different terminologies and fiddling with the parameters. I have had several occasions to run factor analyses in both SPSS and R (typically working in R and then reproducing the analysis in SPSS to share it with colleagues) and always obtained essentially the same results. I would therefore generally not expect large differences, which leads me to suspect the problem might be specific to your data set. I did however quickly try the commands you provided on a data set I had lying around (it's a Likert scale) and the differences were in fact bigger than I am used to but not as big as those you describe. (I might update my answer if I get more time to play with this.)
Most of the time, people interpret the sum of squared loadings after rotation as the “proportion of variance explained” by each factor but this is not meaningful following an oblique rotation (which is why it is not reported at all in psych and SPSS only reports the eigenvalues in this case – there is even a little footnote about it in the output). The initial eigenvalues are computed before any factor extraction. Obviously, they don't tell you anything about the proportion of variance explained by your factors and are not really “sum of squared loadings” either (they are often used to decide on the number of factors to retain). SPSS “Extraction Sums of Squared Loadings” should however match the “SS loadings” provided by psych.
This is a wild guess at this stage but have you checked if the factor extraction procedure converged in 25 iterations? If the rotation fails to converge, SPSS does not output any pattern/structure matrix and you can't miss it but if the extraction fails to converge, the last factor matrix is displayed nonetheless and SPSS blissfully continues with the rotation. You would however see a note “a. Attempted to extract 6 factors. More than 25 iterations required. (Convergence=XXX). Extraction was terminated.” If the convergence value is small (something like .005, the default stopping condition being “less than .0001”), it would still not account for the discrepancies you report but if it is really large there is something pathological about your data.

Solved – Exploratory factor analysis – promax & factor cross-loadings

Firstly, principal components and factor analysis are quite different methods. PCA is normally used more as a data reduction technique, while factor analysis is more concerned with finding a latent structure.

On the cross loadings, the oblique rotation allows the factors to be correlated, but typically one would not want items to load on multiple factors. In this case, I would probably examine the factor loadings using other oblique rotations such as oblimin to see if these cross-loadings still appear.

Cross loadings of below .3 are often ignored, but if you have multiple samples with the same cross-loadings, then this may be an indication that the item is indeed associated with more than one factor. Typically, these items are discarded, and I would probably do so unless you have a strong theoretical or practical rationale for retaining them.

Finally, it sounds like you have two samples. In this case, I would perform EFA on your first sample, and then use the second sample to validate your model. This will raise the probability that you are modelling something real, rather than noise.

Best Answer

Related Solutions

Factor Analysis – Interpreting Discrepancies Between R and SPSS in Exploratory Factor Analysis

Solved – Exploratory factor analysis – promax & factor cross-loadings

Related Question