Solved – Calculating variance explained by factors after exploratory factor analysis with oblique rotation in R

factor analysisfactor-rotationr

We conducted an exploratory factor analysis using the psych package with oblique rotation and found an acceptable solution with 3 factors. Now a reviewer ask me to provide the proportion of variance explained by each of these factors. Having seen other posts on this issue (What's the relationship between initial eigenvalues and sums of squared loadings in factor analysis? and
Interpreting discrepancies between R and SPSS with exploratory factor analysis), I still wonder what I should provide. Did the (anonymous) reviewer mean the Eigenvalue-based proportion of variance of the principal component (method 1), although he was speaking of "each of these factors"?

library(psych)
library(GPArotation)
library(data.table)
#load sample data from the internet for demonstration
myDT <- fread('https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/psych/msq.csv')
myDT <- myDT[,c(7:15)] #choosing selection of variables to simpify things

efa <- fa(myDT,nfactors = 3, rotate = "oblimin", fm="minres") #efa with oblique rotation
propEV <- 100*efa$e.values[1:3]/length(efa$e.values) # method 1
round(propEV,2)
[1] 27.56 18.20 14.31
round(sum(propEV),2) #total
[1] 60.08

Or is it more relevant to calculate the proportion each factor explains? And which method should I choose to calculate this? The calculation based on SS-loadings in the psych package (method 2) seems to match SPSS' "Extraction Sums of Squared Loadings" (cf. this post) for unrotated factors .

propSS <- efa$Vaccounted # method 2
round(propSS,2)

                      MR1  MR2  MR3
SS loadings           1.66 1.22 0.84
Proportion Var        0.18 0.14 0.09
Cumulative Var        0.18 0.32 0.41
Proportion Explained  0.45 0.33 0.23
Cumulative Proportion 0.45 0.77 1.00

Ziberna proposes (in his response here) to calculate mean communality for the total % of variance explained method 3, which produces similar results like method 2.

mean(efa$communalities) # method 3
[1] 0.4127141

Lorenzo-Seva (2013) states here
"The reduced correlation matrix computed in most factor analysis methods is systematically non-positive definite. The typical conclusion is that the percentage of explained common variance cannot be computed in EFA." and therefore proposes to use Minimum Rank Factor Analysis instead. When I conduct this with the psych package, it seem to differ slightly from the EFA above when based on SS-loadings (method 4) and again, when based on communalities (method 5).

efa2 <- fa(myDT,nfactors = 3, rotate = "oblimin", fm="minrank") #minimum rank fa with oblique rotation
propSS <- efa2$Vaccounted # method 4
round(propSS,2)

                       MRFA1 MRFA2 MRFA3
SS loadings            1.66  1.27  0.89
Proportion Var         0.18  0.14  0.10
Cumulative Var         0.18  0.33  0.42
Proportion Explained   0.43  0.33  0.23
Cumulative Proportion  0.43  0.77  1.00

mean(efa2$communalities) # method 5
[1] 0.45381

However, the proportion of all methods 1-5 seems not to be based on common variance, all seem to have the denominator 9 (= # of items). But how does this match with the idea of factors reflecting common variance?

So the question remains what is usually reported in papers in terms of explained variance after oblique efa and how is it implemented in R?

Best Answer

I do not know what is usually reported in papers using oblique factor analysis. However, this is what I would do, as in this case at least I know exactly what I am reporting and this makes sense to me.

To compute the percentage of variance of an individual variable, explained by a given factor, one can compute the squares of structure loadings. If we sum this by all variables, we get the sum of the variances (SS loadings) of all variables explained by a given factor. This is also what is computed by SPSS. If we divide this by the sum of all variances of the variables (equal to the number of variances in cased of standardized variables - that is the case always when using correlation matrix, as fa from psych does by default), we get the share/% of explained variance by individual factors. I think you can report that, just make sure that you do not sum this together by factors, which does not make sense when factors are correlated. In addition to that, I would report the % of variance explained by all factors. This can be (in case of use of correlations/standardized variables) computed as mean communality or is actually the same as the total percentage outputted by the print method for the object returned by fa.

Here is how to compute all this based on the efa object from opening question.

# Compute SS loadings
SS<-colSums(efa$Structure^2)
# Compute percentage of explained variance by factor
SS/length(efa$communality)
# Total explained variability
mean(efa$communality)
# WRONG - comulative percantages !!!
cumsum(SS/length(efa$communality))

Just a note. I thing some things have changed in the psych package since my answer that the OP is citing, although using the mean communality is still ok.