R – How to Use lavaan (SEM/CFA) for Factor Analysis Like factanal (EFA)

factor analysislavaanrstructural-equation-modeling

I understand that lavaan is designed to do SEM/CFA while the R function factanal does EFA. EFA and CFA seem very very similar, and so I wonder why I don't seem to be able to specify what to me looks like the same model in lavaan as I can fit in factanal.

Have I misunderstood the statistical relationship between CFA and EFA, or am I simply misusing lavaan syntax?

For example, using the classic Holzinger-Swineford data we can look for two factors in the first 6 observables. lavaan throws this out with an error,

> library(lavaan)
> model <- 'f1 =~ x1 + x2 + x3 + x4 + x5 + x6
+           f2 =~ x1 + x2 + x3 + x4 + x5 + x6
+ '
> fit  <- cfa(model, data = HolzingerSwineford1939)
Warning message:
In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats,  :
  lavaan WARNING: could not compute standard errors!
  lavaan NOTE: this may be a symptom that the model is not identified.

but factanal is fine with it:

> 
> factanal(~x1+x2+x3+x4+x5+x6, factors = 2, data = HolzingerSwineford1939)

Call:
factanal(x = ~x1 + x2 + x3 + x4 + x5 + x6, factors = 2, data = HolzingerSwineford1939)

Uniquenesses:
   x1    x2    x3    x4    x5    x6 
0.574 0.787 0.441 0.284 0.232 0.304 

Loadings:
   Factor1 Factor2
x1 0.293   0.584  
x2 0.106   0.449  
x3         0.747  
x4 0.824   0.191  
x5 0.873          
x6 0.802   0.231  

               Factor1 Factor2
SS loadings      2.183   1.196
Proportion Var   0.364   0.199
Cumulative Var   0.364   0.563

Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 2.07 on 4 degrees of freedom.
The p-value is 0.722 

How do I specific a model like factanal is doing in lavaan?

Best Answer

It is possible to do EFA in a CFA framework. This is sometimes called "E/CFA". A nice discussion of this can be found in:

Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York: Guilford Press.

For this to work, you need to have an "anchor item" for each factor, for which there are no cross-loadings. Looking at the results from factanal, it would make sense to make item x5 the anchor item for the first factor and x3 the anchor item for the second factor. You also need to constrain the variances of the latent factors to 1. And since factanal standardizes the variables, you'll want to do the same here. At the same time, to make things more comparable, I would use an oblique rotation method for EFA (since the E/CFA model will allow the factors to be correlated). So, putting this all together, you can compare:

model <- 'f1 =~ x1 + x2 + 0*x3 + x4 + x5 + x6
          f2 =~ x1 + x2 + x3 + x4 + 0*x5 + x6'

fit  <- cfa(model, data=HolzingerSwineford1939, std.lv=T, std.ov=T)
summary(fit)

factanal(~x1+x2+x3+x4+x5+x6, factors=2, data=HolzingerSwineford1939, rotation="promax")

Here are the results:

lavaan (0.5-18) converged normally after  26 iterations

  Number of observations                           301

  Estimator                                         ML
  Minimum Function Test Statistic                2.109
  Degrees of freedom                                 4
  P-value (Chi-square)                           0.716

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)
Latent variables:
  f1 =~
    x1                0.275    0.061    4.499    0.000
    x2                0.092    0.063    1.469    0.142
    x3                0.000
    x4                0.822    0.050   16.550    0.000
    x5                0.875    0.049   17.964    0.000
    x6                0.798    0.050   16.013    0.000
  f2 =~
    x1                0.559    0.074    7.520    0.000
    x2                0.441    0.071    6.183    0.000
    x3                0.746    0.086    8.644    0.000
    x4                0.120    0.052    2.321    0.020
    x5                0.000
    x6                0.161    0.052    3.085    0.002

Covariances:
  f1 ~~
    f2                0.119    0.088    1.348    0.178

Variances:
    x1                0.572    0.072
    x2                0.784    0.074
    x3                0.440    0.112
    x4                0.283    0.035
    x5                0.231    0.037
    x6                0.303    0.035
    f1                1.000
    f2                1.000

> 
> factanal(~x1+x2+x3+x4+x5 + x6, factors = 2, data = HolzingerSwineford1939, rotation="promax")

Call:
factanal(x = ~x1 + x2 + x3 + x4 + x5 + x6, factors = 2, data = HolzingerSwineford1939,     rotation = "promax")

Uniquenesses:
   x1    x2    x3    x4    x5    x6 
0.574 0.787 0.441 0.284 0.232 0.304 

Loadings:
   Factor1 Factor2
x1  0.180   0.557 
x2          0.457 
x3 -0.147   0.797 
x4  0.842         
x5  0.922  -0.126 
x6  0.809         

               Factor1 Factor2
SS loadings      2.268   1.173
Proportion Var   0.378   0.196
Cumulative Var   0.378   0.574

Factor Correlations:
        Factor1 Factor2
Factor1   1.000   0.417
Factor2   0.417   1.000

Test of the hypothesis that 2 factors are sufficient.
The chi square statistic is 2.07 on 4 degrees of freedom.
The p-value is 0.722

According to Brown (2006), the chi-square statistic should be the same for both approaches. For the example shown in the book, this is indeed the case (but the author uses Mplus for the EFA and E/CFA analyses). In this particular example, the values are close (2.109 and 2.07), but not identical. There seems to be some minor difference in how lavaan and factanal are working here, but ultimately the point is that one can indeed do an exploratory factor analysis using CFA software.

Related Question