Solved – High SRMR despite good fit based on other indices in SEM (latent growth curve)

curve fittinggoodness of fitlatent-variablemplusstructural-equation-modeling

Context: Latent Growth Curve Modeling for continuous variable with 15 time points. One intercept, two slopes (for first and second half of all time points). N=146, >90% data coverage, MLR estimator in MPlus, no covariates.

Problem: High SRMR=0.20 despite good fit based on other indices: chi-square goodness of fit p=.08, RMSEA=0.04, CFI/TLI=0.97.

My beginner's attempt: There is one time point where standardized residual z-score for the mean is significant (freeing that time point didn't help resolving this problem), and another time point that had large and significant z-score for standardized correlation residuals with other time points, the combination of the two might have driven SRMR high??

Question: I can't think of any theoretical reason why this is happening… would this affect the overall interpretation of the model? What can I do to fix it?

Best Answer

Background

The different fit indices tend to be sensitive to different forms of misspecification. Looking at the formulae, where $T$ is the model statistic, $df$ the degrees of freedom, and subscripts indicate baseline versus target:

$$ TLI = \frac{(T_b / df_b) - (T_t / df_t)}{(T_b / df_b) - 1} $$

$$ SRMR = \sqrt{\left( \frac{2\Sigma_{i=1}^{p}\Sigma_{j=1}^{i} \left[ \left(\frac{s_{ij} - \hat{\sigma}_{ij}}{s_{ii}s_{jj}}\right)^2 \right]}{p(p+1)} \right)} $$

One of the keys here is that things like the TLI and CFI (not shown) incorporate the degrees of freedom, which means simpler models but that do fairly well will be preferred. The SRMR does not. There is no benefit from a more parsimonious model.

It is perhaps not surprising then that the different fit indices tend to be sensitive to different types of model misspecification.

A further hint is the squared term --- particular covariances misspecified will contribute much more (dropping off quadratically at the residual nears zero).

Latent Growth Models

Turning to latent growth models, the form is very specific and many parameters are fixed. You have a piecewise model, but still you have many time points (15) and are only fitting two lines. It is not at all surprising to me that there is some misspecification here. The CFI/TLI are likely relatively good because of how parsimonious the model is. That is a good sign, but the SRMR is disturbingly high. It may not change your parameters of interest much, but I would definitely want to at least figure out what part of the model was misspecified.

Suggestions

The tools to determine and correct model misspecification are basically the same regardless of the problem. That is perhaps an oversimplification, but not by much.

In your case, you do not have a measurement issue (that is, you do not need to examine whether there should be alternate factors or different groupings of the items per se); however, it may be unreasonable to assume linear growth, even piecewise linear growth.

Another common area of misspecification with growth models is of the error structure. It is possible, perhaps even likely that residuals will be more highly correlated with near time points than with those farther away in time. If there is some cyclical pattern to the assessments, those may also play a role (e.g., seasons, times of day, days of week, etc.).

Examine the standardized residual covariances --- which ones are high? What happens if you add a residual covariance to account for that? Consider relaxing the linear time constraint. You could try quadratic time or freely estimate it. You can try modification indices to see "automated" suggestions for how to improve your model.

If all of that is seeming too complex or variable. Try simplifying your model. Rather than doing a piecewise model, fit a model to just the first piece (ignore the second piece and leave it out for now). Make sure that your growth models are solid for each of the pieces before combining models for all 15 time points. The same approaches I described can be used with the individual pieces. What happens if the individual pieces fit great, but combined they do not? This suggests it is the relations between pieces that are being misspecified---what time points from the first or second are more likely highly related? What is going on around the measures you may need to account for either in the functional form (linear etc.) or with residual covariances? At each step you can use the residual covariance matrices, modification indices, your own theoretical judgement, examination of the raw correlation matrices, and data visualization to help get a handle on these things.