I was playing around in lavaan to bind together two simple models I previously tested on their own via simple regression analyses. I followed the tutorial provided on the following site: http://lavaan.ugent.be/tutorial/sem.html.
Basically, I have two latent variables ($lv1$ and $lv2$) one has three manifest indicators ($x1$, $x2$, $x3$), the other one has four ($x1$, $x2$, $x3$, $x4$). Both variables predict another variable $y$. Visually speaking:
The data used only contains positive values. I modeled this as following in R and lavaan:
semModel <- '
# measurement models
lv1 =~ x1 + x2 + x3
lv2 =~ x1 + x2 + x3 + x4
# regressions
y ~ lv1
y ~ lv2
# residual correlations
x1 ~~ x2
x1 ~~ x3
x1 ~~ x4
x2 ~~ x3
x2 ~~ x4
x3 ~~ x4
'
Then I ran the following:
fit <- sem(semModel1, data = experimentalData)
summary(fit)
This returned the following errors:
1: In lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
2: In lav_object_post_check(object) :
lavaan WARNING: some estimated lv variances are negative
I then added the option std.ov
to standardise observed variables which still yields the error regarding the standard errors.
fit <- sem(semModel, data = experimentalData, std.ov = TRUE)
In lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, :
lavaan WARNING:
Could not compute standard errors! The information matrix could
not be inverted. This may be a symptom that the model is not
identified.
In the second case, output is as following:
lavaan 0.6-5 ended normally after 32 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 21
Number of observations 583
Model Test User Model:
Test statistic NA
Degrees of freedom -6
P-value (Unknown) NA
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
lv1 =~
x1 1.000
x2 1.155 NA
x3 1.691 NA
lv2 =~
x1 1.000
x2 2.006 NA
x3 2.224 NA
x4 1.422 NA
Regressions:
Estimate Std.Err z-value P(>|z|)
y ~
lv1 0.020 NA
lv2 0.889 NA
Covariances:
Estimate Std.Err z-value P(>|z|)
.x1 ~~
.x2 0.033 NA
.x3 0.134 NA
.x4 -0.003 NA
.x2 ~~
.x3 -0.104 NA
.x4 -0.202 NA
.x3 ~~
.x4 0.013 NA
lv1 ~~
lv2 -0.159 NA
Variances:
Estimate Std.Err z-value P(>|z|)
.x1 0.918 NA
.x2 0.415 NA
.x3 0.444 NA
.x4 0.405 NA
.y 0.772 NA
lv1 0.105 NA
lv2 0.294 NA
Where did I miss something? Are there (logical) errors in the definition of my model?
Best Answer
The model is not identified, which means there is no unique solution to the estimation problem. Identification is a challenging topic, one that is often overlooked. First, your graphical model is incorrect. You have manifest variables pointing to the latent variables, when in your model, the manifest variables measure the latent variable. Second, the cause for the lack of identification is the residual covariances among all the indicators and the fact that all the indicators load on both latent variables. With so few indicators and most of them shared between the latent variables, you cannot supply any residual covariances. In general, each latent variable needs two unique indicators (i.e., unique to each latent variable), and residual covariances generally cannot be included without more than four indicators.
I highly recommend you look at the identification chapter in an SEM textbook, like Bollen (1989). There are specific rules to identification and ways to assess whether your model is identified.