I specified the following model for SEM analysis using the 'lavaan' package in R. I want to specify a covariance between two observed variables (livestock and human occupancy). This is the only residual correlation between two variables that I specify in my model, but my model output spits out the residual correlations between what seems to be every (or almost every) pairwise combination of my observed variables. How can I get it to stop doing that and only give me the one covariance estimate that I want? All code and output is below. Do the warning messages have something to do with this issue? I'll admit I'm not sure what the second warning means, but I do think the model was properly identified re: the first warning. \
>modelall <- '
# regressions
lion_occ ~ mgmt01 + avgprecip + pctsavanna + riverkm_perkm2 + roadkm_perkm2 +
distboundarykm + distedgekm + logprey + competitor_occ + human_occ + livest_occ
logprey ~ mgmt01 + riverkm_perkm2 + roadkm_perkm2 + distedgekm + human_occ + livest_occ +
distboundarykm + avg_FRP + avgprecip + pctsavanna
competitor_occ ~ mgmt01 + avgprecip + pctsavanna + riverkm_perkm2 + roadkm_perkm2 +
distboundarykm + distedgekm + logprey + human_occ + livest_occ
#residual correlations
livest_occ ~~ human_occ
'
>sem.all <- sem(modelall, data=gridcovar, se="bootstrap", bootstrap=1000)
Warning messages:
1: In lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, :
lavaan WARNING:
The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue
(= -1.808622e-06) is smaller than zero. This may be a symptom that
the model is not identified.
2: In lavaan::lavaan(model = modelall, data = gridcovar, se = "bootstrap", :
lavaan WARNING: not all elements of the gradient are (near) zero;
the optimizer may not have found a local solution;
use lavInspect(fit, "optim.gradient") to investigate
> summary(sem.all, standardized=TRUE, fit.measures=TRUE)
lavaan 0.6-3 ended normally after 229 iterations
Optimization method NLMINB
Number of free parameters 73
Number of observations 204
Estimator ML
Model Fit Test Statistic 43.387
Degrees of freedom 18
P-value (Chi-square) 0.001
Model test baseline model:
Minimum Function Test Statistic 556.063
Degrees of freedom 50
P-value 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.950
Tucker-Lewis Index (TLI) 0.861
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3259.212
Loglikelihood unrestricted model (H1) -3237.518
Number of free parameters 73
Akaike (AIC) 6664.423
Bayesian (BIC) 6906.646
Sample-size adjusted Bayesian (BIC) 6675.360
Root Mean Square Error of Approximation:
RMSEA 0.083
90 Percent Confidence Interval 0.052 0.115
P-value RMSEA <= 0.05 0.042
Standardized Root Mean Square Residual:
SRMR 0.063
Parameter Estimates:
Standard Errors Bootstrap
Number of requested bootstrap draws 1000
Number of successful bootstrap draws 1000
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
lion_occ ~
mgmt01 -0.008 0.004 -2.143 0.032 -0.008 -0.172
avgprecip -0.000 0.000 -1.340 0.180 -0.000 -0.091
pctsavanna 0.033 0.020 1.652 0.098 0.033 0.124
riverkm_perkm2 0.000 0.000 1.412 0.158 0.000 0.113
roadkm_perkm2 0.000 0.001 0.137 0.891 0.000 0.009
distboundarykm 0.000 0.001 0.825 0.409 0.000 0.069
distedgekm 0.000 0.000 1.321 0.187 0.000 0.081
logprey 0.002 0.003 0.703 0.482 0.002 0.040
competitor_occ 0.044 0.022 2.013 0.044 0.044 0.213
human_occ 0.139 0.049 2.845 0.004 0.139 0.406
livest_occ -0.017 0.069 -0.240 0.811 -0.017 -0.039
logprey ~
mgmt01 0.063 0.070 0.908 0.364 0.063 0.064
riverkm_perkm2 0.000 0.000 2.049 0.040 0.000 0.139
roadkm_perkm2 0.045 0.013 3.442 0.001 0.045 0.222
distedgekm 0.011 0.004 2.568 0.010 0.011 0.178
human_occ -0.445 0.740 -0.601 0.548 -0.445 -0.065
livest_occ 0.052 0.947 0.055 0.956 0.052 0.006
distboundarykm 0.001 0.010 0.124 0.901 0.001 0.009
avg_FRP -0.011 0.006 -1.683 0.092 -0.011 -0.115
avgprecip 0.003 0.008 0.355 0.722 0.003 0.028
pctsavanna -1.600 0.430 -3.723 0.000 -1.600 -0.295
competitor_occ ~
mgmt01 -0.159 0.013 -11.833 0.000 -0.159 -0.671
avgprecip 0.004 0.001 3.637 0.000 0.004 0.175
pctsavanna 0.109 0.074 1.476 0.140 0.109 0.084
riverkm_perkm2 -0.000 0.000 -0.128 0.898 -0.000 -0.007
roadkm_perkm2 0.003 0.003 1.018 0.309 0.003 0.054
distboundarykm 0.000 0.001 0.256 0.798 0.000 0.008
distedgekm -0.000 0.000 -0.156 0.876 -0.000 -0.005
logprey 0.017 0.011 1.636 0.102 0.017 0.072
human_occ -0.410 0.178 -2.302 0.021 -0.410 -0.250
livest_occ 0.743 0.297 2.499 0.012 0.743 0.360
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
human_occ ~~
livest_occ 0.003 0.000 6.361 0.000 0.003 0.772
mgmt01 ~~
avgprecip -1.200 0.148 -8.097 0.000 -1.200 -0.483
pctsavanna 0.007 0.003 2.163 0.031 0.007 0.149
riverkm_perkm2 -83.785 44.061 -1.902 0.057 -83.785 -0.140
roadkm_perkm2 -0.223 0.083 -2.676 0.007 -0.223 -0.190
distboundarykm 0.291 0.113 2.566 0.010 0.291 0.166
distedgekm -0.034 0.277 -0.123 0.902 -0.034 -0.009
avg_FRP 0.303 0.178 1.695 0.090 0.303 0.117
avgprecip ~~
pctsavanna -0.192 0.032 -6.005 0.000 -0.192 -0.424
riverkm_perkm2 867.678 471.086 1.842 0.065 867.678 0.142
roadkm_perkm2 2.594 0.742 3.493 0.000 2.594 0.217
distboundarykm -1.745 1.077 -1.620 0.105 -1.745 -0.098
distedgekm -0.278 2.782 -0.100 0.920 -0.278 -0.007
avg_FRP 0.576 1.480 0.389 0.697 0.576 0.022
pctsavanna ~~
riverkm_perkm2 -31.479 7.781 -4.045 0.000 -31.479 -0.288
roadkm_perkm2 -0.063 0.016 -3.926 0.000 -0.063 -0.293
distboundarykm 0.110 0.021 5.367 0.000 0.110 0.345
distedgekm 0.054 0.045 1.186 0.235 0.054 0.075
avg_FRP -0.067 0.030 -2.245 0.025 -0.067 -0.142
riverkm_perkm2 ~~
roadkm_perkm2 784.908 206.882 3.794 0.000 784.908 0.271
distboundarykm -638.597 266.009 -2.401 0.016 -638.597 -0.148
distedgekm 2651.124 622.777 4.257 0.000 2651.124 0.276
avg_FRP 213.120 397.372 0.536 0.592 213.120 0.034
roadkm_perkm2 ~~
distboundarykm -2.038 0.493 -4.131 0.000 -2.038 -0.241
distedgekm 0.330 1.317 0.250 0.802 0.330 0.017
avg_FRP 0.361 0.755 0.479 0.632 0.361 0.029
distboundarykm ~~
distedgekm 3.655 1.744 2.096 0.036 3.655 0.130
avg_FRP -1.400 1.427 -0.981 0.327 -1.400 -0.075
distedgekm ~~
avg_FRP -0.118 3.140 -0.038 0.970 -0.118 -0.003
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.lion_occ 0.000 0.000 5.528 0.000 0.000 0.707
.logprey 0.173 0.020 8.741 0.000 0.173 0.726
.competitor_occ 0.005 0.001 5.608 0.000 0.005 0.346
human_occ 0.005 0.001 6.165 0.000 0.005 1.000
livest_occ 0.003 0.001 5.619 0.000 0.003 1.000
mgmt01 0.244 0.006 42.523 0.000 0.244 1.000
avgprecip 25.290 2.607 9.701 0.000 25.290 1.000
pctsavanna 0.008 0.001 10.647 0.000 0.008 1.000
riverkm_perkm2 1474505.124 111777.802 13.191 0.000 1474505.124 1.000
roadkm_perkm2 5.672 0.580 9.783 0.000 5.672 1.000
distboundarykm 12.603 1.374 9.173 0.000 12.603 1.000
distedgekm 62.778 5.988 10.485 0.000 62.778 1.000
avg_FRP 27.339 4.955 5.517 0.000 27.339 1.000
Thank you in advance for any help you can give!
Best Answer
Welcome Kirby!
lavaan usually defaults to estimating correlations between observed variables (and when you specify them--it doesn't appear you have--latent variables) unless you tell it to otherwise. lavaan provides a shorthand option for overriding this default when dealing with latent variables (using
orthogonal = TRUE
incfa
orsem
), but this won't help you here because all of your correlations are among observed variables--you'll need to manually fix each of these to a value of zero (i.e., thereby indicating you are not interested in estimating them/are comfortable assuming they take on a value of 0).The tutorial materials on the lavaan website give a good overview of how to fix parameters in this fashion, but as an example, fixing all the correlations to 0 involving the
mgmt01
variable would look like this:The tl;dr: here is that with lavaan, it's often valuable (though potentially annoying) to specify everything that you do/do not want estimated, in order to be sure you're getting exactly the model you want.
Regarding the warning messages you're getting, I find it's sometimes helpful to sketch out your path diagram to make sure identification isn't a problem and that you haven't coded a linear dependency somewhere--in this case, I share your intuition that identification isn't the problem. A more plausible candidate, in my opinion, is that you're asking for an awful lot of estimates/inferences from a relatively modest sample of data, and estimation errors under these kinds of conditions aren't uncommon. This might clear up after you constrain all those correlations you don't want to zero, but otherwise it might be a case where you need more data.