I'm in the process of fitting measurement invariance models using the semTools
package for R. For the basic measurement model–not distinguishing between groups–I used a robust ML estimator (MLR
), and everything seemed fine. But I noticed that when I used this robust estimator (or an alternative robust estimator like MLM
) when testing measurement invariance, my models yielded fit indexes that seemed highly implausible, if not impossible. Specifically, the CFI
and the RMSEA
both improve after constraining loadings and intercepts to equality, rather than worsening (as they ought to). When I re-run these models with a traditional ML estimator, however, this problem disappears (but then I do not get the benefits of correcting my test statistic for non-normality in my indicators).
Any guidance or ideas about what may be going on here would be seriously appreciated.
Update 1
Okay, as requested by Jeremy Miles, here's an update with some model code and output, in order to better diagnose the issue:
Update 2
Posted some factor loadings and latent covariances from both groups, to help diagnose why the scaling factor is changing so much between models.
Model Syntax
library(semTools)
library (lavaan)
RNS.invar<-"
manage =~ RNS_3 + RNS_8 + RNS_6 + RNS_19 + RNS_13 + RNS_4 + RNS_12 + RNS_15 + RNS_1 + RNS_22 + RNS_18 + RNS_5 + RNS_24
agree =~ RNS_8 + RNS_19 + RNS_13 + RNS_12 + RNS_15 + RNS_1 + RNS_18 + RNS_17 + RNS_11 + RNS_14 + RNS_7 + RNS_9 + RNS_16 + RNS_5 + RNS_2
explicit =~ RNS_3 + RNS_20 + RNS_4 + RNS_1 + RNS_22 + RNS_18 + RNS_17 + RNS_9 + RNS_16 + RNS_10 + RNS_5 + RNS_2
punish =~ RNS_12 + RNS_22 + RNS_9 + RNS_5 + RNS_21 + RNS_20 + RNS_23 + RNS_24
"
invariance=measurementInvariance(RNS.invar, data=dat, group="Male", estimator="MLR")
Output: invariance$fit.configural
Model Fit Indexes
Estimator ML Robust
Minimum Function Test Statistic 983.743 890.242
Degrees of freedom 444 444
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.105
for the Yuan-Bentler correction
Comparative Fit Index (CFI) 0.858 0.846
RMSEA 0.100 0.091
Factor Loadings
Group 1 [1]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage =~
RNS_3 1.000 1.428 0.865
RNS_8 0.813 0.088 9.201 0.000 1.161 0.673
RNS_6 0.988 0.088 11.189 0.000 1.411 0.811
RNS_19 0.889 0.116 7.680 0.000 1.270 0.694
Group 2 [2]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage =~
RNS_3 1.000 1.549 0.907
RNS_8 0.798 0.091 8.741 0.000 1.237 0.823
RNS_6 0.982 0.102 9.617 0.000 1.521 0.861
RNS_19 0.728 0.121 6.002 0.000 1.127 0.695
Latent Covariances
Group 1 [1]:
Covariances:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage ~~
agree 0.190 0.098 1.948 0.051 0.382 0.382
explicit 0.004 0.020 0.183 0.855 0.174 0.174
punish 0.056 0.051 1.095 0.274 0.324 0.324
agree ~~
explicit -0.002 0.012 -0.177 0.859 -0.400 -0.400
punish 0.001 0.005 0.171 0.865 0.021 0.021
explicit ~~
punish -0.000 0.001 -0.152 0.879 -0.085 -0.085
Group 2 [2]:
Covariances:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage ~~
agree 0.063 0.066 0.963 0.336 0.137 0.137
explicit 0.039 0.167 0.232 0.817 0.877 0.877
punish 0.015 0.257 0.059 0.953 0.415 0.415
agree ~~
explicit 0.000 0.003 0.184 0.854 0.057 0.057
punish -0.001 0.023 -0.059 0.953 -0.195 -0.195
explicit ~~
punish 0.000 0.006 0.057 0.955 0.519 0.519
Output: invariance$fit.loadings
Model Fit Indexes
Estimator ML Robust
Minimum Function Test Statistic 1021.659 861.367
Degrees of freedom 488 488
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.186
for the Yuan-Bentler correction
Comparative Fit Index (CFI) 0.859 0.871
RMSEA 0.095 0.079
Factor Loadings
Group 1 [1]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage =~
RNS_3 1.000 1.462 0.873
RNS_8 (.p2.) 0.791 0.054 14.673 0.000 1.156 0.678
RNS_6 (.p3.) 0.981 0.053 18.469 0.000 1.435 0.817
RNS_19 (.p4.) 0.777 0.079 9.807 0.000 1.136 0.644
Group 2 [2]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage =~
RNS_3 1.000 1.536 0.913
RNS_8 (.p2.) 0.791 0.054 14.673 0.000 1.215 0.800
RNS_6 (.p3.) 0.981 0.053 18.469 0.000 1.508 0.863
RNS_19 (.p4.) 0.777 0.079 9.807 0.000 1.194 0.698
Latent Covariances
Group 1 [1]:
Covariances:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage ~~
agree 0.177 0.049 3.592 0.000 0.392 0.392
explicit 0.001 0.001 1.683 0.092 0.164 0.164
punish 0.005 0.002 2.877 0.004 0.304 0.304
agree ~~
explicit -0.001 0.000 -3.217 0.001 -0.389 -0.389
punish 0.000 0.000 0.000 1.000 0.000 0.000
explicit ~~
punish -0.000 0.000 -0.479 0.632 -0.061 -0.061
Group 2 [2]:
Covariances:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage ~~
agree 0.113 0.063 1.798 0.072 0.221 0.221
explicit 0.004 0.001 2.745 0.006 0.335 0.335
punish 0.010 0.002 3.948 0.000 0.455 0.455
agree ~~
explicit -0.001 0.000 -4.246 0.000 -0.527 -0.527
punish -0.000 0.001 -0.476 0.634 -0.060 -0.060
explicit ~~
punish 0.000 0.000 1.130 0.259 0.149 0.149
Output: invariance$fit.intercepts
Model Fit Indexes
Estimator ML Robust
Minimum Function Test Statistic 1049.546 754.119
Degrees of freedom 508 508
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.392
for the Yuan-Bentler correction
Comparative Fit Index (CFI) 0.857 0.915
RMSEA 0.094 0.063
Factor Loadings
Group 1 [1]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage =~
RNS_3 1.000 1.460 0.872
RNS_8 (.p2.) 0.802 0.047 17.113 0.000 1.171 0.678
RNS_6 (.p3.) 0.986 0.044 22.531 0.000 1.440 0.817
RNS_19 (.p4.) 0.786 0.054 14.597 0.000 1.148 0.648
Group 2 [2]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage =~
RNS_3 1.000 1.532 0.913
RNS_8 (.p2.) 0.802 0.047 17.157 0.000 1.229 0.800
RNS_6 (.p3.) 0.986 0.044 22.278 0.000 1.512 0.863
RNS_19 (.p4.) 0.786 0.054 14.504 0.000 1.205 0.700
Latent Covariances
Group 1 [1]:
Covariances:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage ~~
agree 0.175 0.044 3.964 0.000 0.391 0.391
explicit 0.001 0.001 1.711 0.087 0.167 0.167
punish 0.004 0.001 3.116 0.002 0.306 0.306
agree ~~
explicit -0.001 0.000 -5.071 0.000 -0.391 -0.391
punish 0.000 0.000 0.003 0.997 0.000 0.000
explicit ~~
punish -0.000 0.000 -0.338 0.735 -0.059 -0.059
Group 2 [2]:
Covariances:
Estimate Std.Err Z-value P(>|z|) Std.lv Std.all
manage ~~
agree 0.111 0.054 2.045 0.041 0.220 0.220
explicit 0.003 0.001 3.713 0.000 0.340 0.340
punish 0.007 0.001 5.103 0.000 0.454 0.454
agree ~~
explicit -0.001 0.000 -6.576 0.000 -0.530 -0.530
punish -0.000 0.000 -0.520 0.603 -0.062 -0.062
explicit ~~
punish 0.000 0.000 1.139 0.255 0.153 0.153
Summary
As I would expect, the $\chi^2$ statistic increases with each added level of invariance, when estimated using the boring ol' ML estimator. But the robust estimates of the $\chi^2$ statistic are decreasing with each added level of invariance constraints, when estimated using the MLR estimator…
Best Answer
The source of your problem is the 'robust' estimation of standard errors using the robust Satorra-Bentler Chi-square statistic. When testing for measurement invariance, we compare less constrained (configural invariance) to more constrained (metric or scalar invariance) models. The comparison that is usually applied is a Chi-square difference test, which compares the Chi-square of a less constrained to a more constrained model, testing the null hypothesis that both models have the same fit.
In addition some authors argue that one may also look at the change in RMSEA or CFI, but there are no strong advices on which change in these statistics is desired. Therefore my advice is to first of all look at change in model Chi-square and the associated p-value for above mentioned null hypothesis. I will therefore first answer your question in terms of Chi-square change and then address the change in CFI and RMSEA
Testing the change in model Chi-square
MLR uses a scaled version of Chi-square to find robust standard errors following a paper by Satorra and Bentler in Psychometrica. The problem you are facing now is that, as you say, the (scaled) Chi-squares decrease across more constrained version of the model. In fact, the simple scaled Chi-square differences between your models is negative and thus undefined.
This behavior can be expected because the difference in scaled Chi-squares is not Chi-square distributed. A Chi-square difference test using scaled Chi-squares needs to be adapted before the Chi-square difference can be interpreted in the usual way. Specifically, the adjustment goes as follows. First we calculate a scaling correction factor:
$$s= (d_0c_0-d_1c_1)/(d_0-c_1)$$
where $d_0$ is the degrees of freedom of the nested (constrained) model and $d_1$ in the unconstrained model. Furthermore $c_1$ and $c_0$ are the scaling correction factors reported by
lavaan
or other SEM packages like Mplus. Subsequently, we calculate a corrected Chi-sqaure difference$$ \Delta_{\chi} = (T_0c_0 - T_1c_1)/ s $$
where $T_0$ and $T_1$ are the scaled (robust) model Chi-squares. This adjusted Chi-square is then tested on a central Chi-square distribution with degrees of freedom equal to the difference in degrees of freedom of the two models.
To provide an example for your data for testing configural against metric invariance in
R
, we use a short script:We can see that the scaled Chi-square difference is 18.9 (and now it has a positive sign!), which when tested with $\alpha=.05$ type-1 error probability is not significant. Hence there is evidence for metric invariance in your data.
There is a lot of documentation on this problem on the Mplus website. See here for a discussion of difference testing with scaled Chi-square. The correction I suggest is the simple adjustment variant which in some cases may still yield negative Chi-square. There is a more recent and more sophisticated approach called the strictly positive Chi-square difference. It is described on the Mplus website I linked.
Decrease in fit indices (RMSEA and CFI):
It was remarked that my answer did not yet sufficiently address the RMSEA and CFI increase that was observed over increasingly constrained versions of a baseline model. To understand this we first of all need to refer to the definitions of the two statistics:
$$RMSEA = \frac{(\chi ^2-df)^{\frac{1}{2}}}{df(n-1)}$$
and
$$CFI = \frac{ (\chi_0^2 - df_0) - (\chi_1^2 - df_1) }{ (\chi_0^2 - df_0)}$$
where $0$ and $1$ indicate the null model and the tested model respectively. It can be seen that both fit measures depend on $\chi^2$ and the $df$ of the model. The scaled $\chi^2$ is designed in a way to be more 'robust' to many practical problems, in particular the violation of multi-variate noramlity in continuous factor analysis. If we assume scaled $\chi^2$ is a more valid version than unscaled $\chi^2$, we may conclude that also ´scaled rmsea´ and ´scaled cfi´ are more precise versions. In
lavaan
you therefore need to check that you looked at the correctscaled rmsea
andscaled cfi
.Assuming that you did this already, it can be seen from the definitions of the two indices that a decrease in RMSEA and CFI across more constraint versions of the model is actually possible, in fact it is desirable!
To see this, we first of all assume that the chi-square of the constrained and unconstrained models does not change. This means that the more strict model is true. However the number of parameters in the model decreases, thus $df$'s increase. Now let $a$ denote the unconstrained (e.g. configural) and $b$ the constrained (e.g. metric) model. So we know that $df_a<df_b$ while assuming $\chi^2_a =\chi^2_b = \chi^2$ (i.e. no decrease in fit / more constrained model is true). Now we wonder if it is possible whether
$$RMSEA_a > RMSEA_b$$
as well as
$$CFI_a > CFI_b$$
It is particularly easy to see this for $CFI$, because there we have
$$CFI_a > CFI_b \Leftrightarrow (\chi_a^2 - df_a) - (\chi_b^2 - df_b) > 0 \\ \Leftrightarrow (\chi^2 - df_a) - (\chi^2 - df_b) > 0 \\ \Leftrightarrow df_b > df_a $$
which is always true if $\chi^2_a =\chi^2_b = \chi^2$. Hence $CFI$ of the more constrained model can be smaller than that of the unconstrained model and necessarily is when fit of the two models is exctly equal. For RMSEA the situation is a little bit more complicated because the inequality involves squared terms of $\chi^2$, $df_a$ and $df_b$. This suggests that the solution under the assumption $\chi^2_a = \chi^2_b$ depends on their particular values, but under certain combinations the inequality will hold as well.
Hence in conclusion, what you observe is possible. In particular we are more likely to find it in situations when the model $\chi^2$ only marginally changes while the amoung of additionally constrained parameters is large. This is exectly the result we get when a more constrained model is the true model and the less constrained model was specified too 'flexible' (over-parametrized). Thus decrease in the two fit measures is even better news than a (small) increase!