Solved – How to interpret the model fit indices generated by lavaan (in R)? Something wrong with the model specifications

goodness of fitpath-modelrsimulationstructural-equation-modeling

I am trying to replicate a path analysis SEM model using Lavaan in R, and was very confused about the results that it gave regarding the model fit statistics.

The code is as follows:

#Import Package
library(lavaan)

#Input Correlation Matrix
sigma <- matrix(c(1.00, -0.03,  0.39, -0.05, -0.08,
                 -0.03,  1.00,  0.07, -0.23, -0.16,
                  0.39,  0.07,  1.00, -0.13, -0.29,
                 -0.05, -0.23, -0.13,  1.00,  0.34,
                 -0.08, -0.16 ,-0.29,  0.34,  1.00), nr=5, byrow=TRUE)
rownames(sigma) <-c("Exercise", "Hardiness", "Fitness", "Stress", "Illness")
colnames(sigma) <-c("Exercise", "Hardiness", "Fitness", "Stress", "Illness")

#Create Covariance Matrix
sdevs <-c(66.5, 3.8, 18.4, 6.7, 624.8)
covmax <- cor2cov(sigma, sdevs)
as.matrix(covmax)

#Specify Model 
mymodel<-'Illness ~ Exercise + Fitness
Illness ~ Hardiness + Stress
Fitness ~ Exercise + Hardiness 
Stress ~ Exercise + Hardiness + Fitness 
Exercise ~~ Exercise 
Hardiness ~~ Hardiness 
Exercise ~~ Hardiness'

#Fit the model with the covariance matrix
N = 363
fit.path <-sem(mymodel,sample.cov=covmax, sample.nobs=N, fixed.x=FALSE)

#Summary of the model fit
summary(fit.path, fit.measures = TRUE)

And the output I get is as follows:

 lavaan (0.5-12) converged normally after  93 iterations

 Number of observations                         37300

 Estimator                                         ML
 Minimum Function Test Statistic                0.000
 Degrees of freedom                                 0
 P-value (Chi-square)                           1.000

 Model test baseline model:

 Minimum Function Test Statistic            16594.387
 Degrees of freedom                                10
 P-value                                        0.000

 Full model versus baseline model:

 Comparative Fit Index (CFI)                    1.000
 Tucker-Lewis Index (TLI)                       1.000

 Loglikelihood and Information Criteria:

 Loglikelihood user model (H0)             -882379.005
 Loglikelihood unrestricted model (H1)     -882379.005

 Number of free parameters                         15
 Akaike (AIC)                              1764788.009
 Bayesian (BIC)                            1764915.910
 Sample-size adjusted Bayesian (BIC)       1764868.240

 Root Mean Square Error of Approximation:

 RMSEA                                          0.000
 90 Percent Confidence Interval          0.000  0.000
 P-value RMSEA <= 0.05                          1.000

 Standardized Root Mean Square Residual:

 SRMR                                           0.000

 Parameter estimates:

 Information                                 Expected
 Standard Errors                             Standard

                Estimate  Std.err  Z-value  P(>|z|)
 Regressions:
 Illness ~
 Exercise          0.318    0.048    6.640    0.000
 Fitness          -8.835    0.174  -50.737    0.000
 Hardiness       -12.146    0.793  -15.321    0.000
 Stress           27.125    0.451   60.079    0.000
 Fitness ~
 Exercise          0.109    0.001   82.602    0.000
 Hardiness         0.396    0.023   17.211    0.000
 Stress ~
 Exercise         -0.001    0.001   -2.614    0.009
 Hardiness        -0.393    0.009  -44.332    0.000
 Fitness          -0.040    0.002  -19.953    0.000

 Covariances:
 Exercise ~~
 Hardiness        -7.581    1.309   -5.791    0.000

 Variances:
 Exercise       4422.131   32.381
 Hardiness        14.440    0.106
 Illness       318744.406 2334.012
 Fitness         284.796    2.085
 Stress           41.921    0.307

These are my questions:

  • Why does the chi-squared say that there are no degrees of freedom?
  • Why are the p-values exactly 1? Why is the CFI and TLI exactly 1?
  • Why is the RMSEA 0?

  • What would I need to do to simulate a more realistic model that doesn't appear artificially "perfect"?

  • Does it have to do with the model specification?

Best Answer

It appears that this is a model where (almost) everything is regressed on everything else.

You have 5 variables in your model. That means you have 10 covariances.

You have 10 parameters.

The df of the model is equal to (number of covariances) - (number of parameters). This is zero. The model is described as saturated, and it's not testing anything. Because it's not testing anything, the fit indices are all perfect. (This will make sense if you look at the formulas for the fit indices - a zero chi-square should give you these fit indices).

What do you mean by simulate a model? If you don't want the fit to be perfect, add some constraints. Typically, one constrains to zero.

So yes, it has to do with the model specification. It's a an unusual model to test with an SEM, but if that's your model you want to test, that's your model. If you want to make it more testable, you need to add a variable which is a possible cause of one variable, but not of the others. For example, social support might influence stress, but should not (directly) incfuence ilness, and perhaps not the others. If you add social support, and put an arrow from social support ONLY to stress, you will add 6 covariances to the model, but only add 1 df. Hence your model will have 5 df, and the fit will no longer be perfect.