Solved – Calculating variance explained by factors after exploratory factor analysis with oblique rotation in R

factor analysisfactor-rotationr

We conducted an exploratory factor analysis using the psych package with oblique rotation and found an acceptable solution with 3 factors. Now a reviewer ask me to provide the proportion of variance explained by each of these factors. Having seen other posts on this issue (What's the relationship between initial eigenvalues and sums of squared loadings in factor analysis? and
Interpreting discrepancies between R and SPSS with exploratory factor analysis), I still wonder what I should provide. Did the (anonymous) reviewer mean the Eigenvalue-based proportion of variance of the principal component (method 1), although he was speaking of "each of these factors"?

library(psych)
library(GPArotation)
library(data.table)
#load sample data from the internet for demonstration
myDT <- fread('https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/psych/msq.csv')
myDT <- myDT[,c(7:15)] #choosing selection of variables to simpify things

efa <- fa(myDT,nfactors = 3, rotate = "oblimin", fm="minres") #efa with oblique rotation
propEV <- 100*efa$e.values[1:3]/length(efa$e.values) # method 1
round(propEV,2)
[1] 27.56 18.20 14.31
round(sum(propEV),2) #total
[1] 60.08

Or is it more relevant to calculate the proportion each factor explains? And which method should I choose to calculate this? The calculation based on SS-loadings in the psych package (method 2) seems to match SPSS' "Extraction Sums of Squared Loadings" (cf. this post) for unrotated factors .

propSS <- efa$Vaccounted # method 2
round(propSS,2)

                      MR1  MR2  MR3
SS loadings           1.66 1.22 0.84
Proportion Var        0.18 0.14 0.09
Cumulative Var        0.18 0.32 0.41
Proportion Explained  0.45 0.33 0.23
Cumulative Proportion 0.45 0.77 1.00

Ziberna proposes (in his response here) to calculate mean communality for the total % of variance explained method 3, which produces similar results like method 2.

mean(efa$communalities) # method 3
[1] 0.4127141

Lorenzo-Seva (2013) states here
"The reduced correlation matrix computed in most factor analysis methods is systematically non-positive definite. The typical conclusion is that the percentage of explained common variance cannot be computed in EFA." and therefore proposes to use Minimum Rank Factor Analysis instead. When I conduct this with the psych package, it seem to differ slightly from the EFA above when based on SS-loadings (method 4) and again, when based on communalities (method 5).

efa2 <- fa(myDT,nfactors = 3, rotate = "oblimin", fm="minrank") #minimum rank fa with oblique rotation
propSS <- efa2$Vaccounted # method 4
round(propSS,2)

                       MRFA1 MRFA2 MRFA3
SS loadings            1.66  1.27  0.89
Proportion Var         0.18  0.14  0.10
Cumulative Var         0.18  0.33  0.42
Proportion Explained   0.43  0.33  0.23
Cumulative Proportion  0.43  0.77  1.00

mean(efa2$communalities) # method 5
[1] 0.45381

However, the proportion of all methods 1-5 seems not to be based on common variance, all seem to have the denominator 9 (= # of items). But how does this match with the idea of factors reflecting common variance?

So the question remains what is usually reported in papers in terms of explained variance after oblique efa and how is it implemented in R?

Best Answer

I do not know what is usually reported in papers using oblique factor analysis. However, this is what I would do, as in this case at least I know exactly what I am reporting and this makes sense to me.

To compute the percentage of variance of an individual variable, explained by a given factor, one can compute the squares of structure loadings. If we sum this by all variables, we get the sum of the variances (SS loadings) of all variables explained by a given factor. This is also what is computed by SPSS. If we divide this by the sum of all variances of the variables (equal to the number of variances in cased of standardized variables - that is the case always when using correlation matrix, as fa from psych does by default), we get the share/% of explained variance by individual factors. I think you can report that, just make sure that you do not sum this together by factors, which does not make sense when factors are correlated. In addition to that, I would report the % of variance explained by all factors. This can be (in case of use of correlations/standardized variables) computed as mean communality or is actually the same as the total percentage outputted by the print method for the object returned by fa.

Here is how to compute all this based on the efa object from opening question.

# Compute SS loadings
SS<-colSums(efa$Structure^2)
# Compute percentage of explained variance by factor
SS/length(efa$communality)
# Total explained variability
mean(efa$communality)
# WRONG - comulative percantages !!!
cumsum(SS/length(efa$communality))

Just a note. I thing some things have changed in the psych package since my answer that the OP is citing, although using the mean communality is still ok.

Related Solutions

Solved – Interpreting discrepancies between R and SPSS with exploratory factor analysis

First of all, I second ttnphns recommendation to look at the solution before rotation. Factor analysis as it is implemented in SPSS is a complex procedure with several steps, comparing the result of each of these steps should help you to pinpoint the problem.

Specifically you can run

FACTOR
/VARIABLES <variables>
/MISSING PAIRWISE
/ANALYSIS <variables>
/PRINT CORRELATION
/CRITERIA FACTORS(6) ITERATE(25)
/EXTRACTION ULS
/CRITERIA ITERATE(25)
/ROTATION NOROTATE.

to see the correlation matrix SPSS is using to carry out the factor analysis. Then, in R, prepare the correlation matrix yourself by running

r <- cor(data)

Any discrepancy in the way missing values are handled should be apparent at this stage. Once you have checked that the correlation matrix is the same, you can feed it to the fa function and run your analysis again:

fa.results <- fa(r, nfactors=6, rotate="promax",
scores=TRUE, fm="pa", oblique.scores=FALSE, max.iter=25)

If you still get different results in SPSS and R, the problem is not missing values-related.

Next, you can compare the results of the factor analysis/extraction method itself.

FACTOR
/VARIABLES <variables>
/MISSING PAIRWISE
/ANALYSIS <variables>
/PRINT EXTRACTION
/FORMAT BLANK(.35)
/CRITERIA FACTORS(6) ITERATE(25)
/EXTRACTION ULS
/CRITERIA ITERATE(25)
/ROTATION NOROTATE.

and

fa.results <- fa(r, nfactors=6, rotate="none", 
scores=TRUE, fm="pa", oblique.scores=FALSE, max.iter=25)

Again, compare the factor matrices/communalities/sum of squared loadings. Here you can expect some tiny differences but certainly not of the magnitude you describe. All this would give you a clearer idea of what's going on.

Now, to answer your three questions directly:

In my experience, it's possible to obtain very similar results, sometimes after spending some time figuring out the different terminologies and fiddling with the parameters. I have had several occasions to run factor analyses in both SPSS and R (typically working in R and then reproducing the analysis in SPSS to share it with colleagues) and always obtained essentially the same results. I would therefore generally not expect large differences, which leads me to suspect the problem might be specific to your data set. I did however quickly try the commands you provided on a data set I had lying around (it's a Likert scale) and the differences were in fact bigger than I am used to but not as big as those you describe. (I might update my answer if I get more time to play with this.)
Most of the time, people interpret the sum of squared loadings after rotation as the “proportion of variance explained” by each factor but this is not meaningful following an oblique rotation (which is why it is not reported at all in psych and SPSS only reports the eigenvalues in this case – there is even a little footnote about it in the output). The initial eigenvalues are computed before any factor extraction. Obviously, they don't tell you anything about the proportion of variance explained by your factors and are not really “sum of squared loadings” either (they are often used to decide on the number of factors to retain). SPSS “Extraction Sums of Squared Loadings” should however match the “SS loadings” provided by psych.
This is a wild guess at this stage but have you checked if the factor extraction procedure converged in 25 iterations? If the rotation fails to converge, SPSS does not output any pattern/structure matrix and you can't miss it but if the extraction fails to converge, the last factor matrix is displayed nonetheless and SPSS blissfully continues with the rotation. You would however see a note “a. Attempted to extract 6 factors. More than 25 iterations required. (Convergence=XXX). Extraction was terminated.” If the convergence value is small (something like .005, the default stopping condition being “less than .0001”), it would still not account for the discrepancies you report but if it is really large there is something pathological about your data.

Solved – Exploratory factor analysis – promax & factor cross-loadings

Firstly, principal components and factor analysis are quite different methods. PCA is normally used more as a data reduction technique, while factor analysis is more concerned with finding a latent structure.

On the cross loadings, the oblique rotation allows the factors to be correlated, but typically one would not want items to load on multiple factors. In this case, I would probably examine the factor loadings using other oblique rotations such as oblimin to see if these cross-loadings still appear.

Cross loadings of below .3 are often ignored, but if you have multiple samples with the same cross-loadings, then this may be an indication that the item is indeed associated with more than one factor. Typically, these items are discarded, and I would probably do so unless you have a strong theoretical or practical rationale for retaining them.

Finally, it sounds like you have two samples. In this case, I would perform EFA on your first sample, and then use the second sample to validate your model. This will raise the probability that you are modelling something real, rather than noise.

Best Answer

Related Solutions

Solved – Interpreting discrepancies between R and SPSS with exploratory factor analysis

Solved – Exploratory factor analysis – promax & factor cross-loadings

Related Question