The source of your problem is the 'robust' estimation of standard errors using the robust Satorra-Bentler Chi-square statistic. When testing for measurement invariance, we compare less constrained (configural invariance) to more constrained (metric or scalar invariance) models. The comparison that is usually applied is a Chi-square difference test, which compares the Chi-square of a less constrained to a more constrained model, testing the null hypothesis that both models have the same fit.
In addition some authors argue that one may also look at the change in RMSEA or CFI, but there are no strong advices on which change in these statistics is desired. Therefore my advice is to first of all look at change in model Chi-square and the associated p-value for above mentioned null hypothesis. I will therefore first answer your question in terms of Chi-square change and then address the change in CFI and RMSEA
Testing the change in model Chi-square
MLR uses a scaled version of Chi-square to find robust standard errors following a paper by Satorra and Bentler in Psychometrica. The problem you are facing now is that, as you say, the (scaled) Chi-squares decrease across more constrained version of the model. In fact, the simple scaled Chi-square differences between your models is negative and thus undefined.
This behavior can be expected because the difference in scaled Chi-squares is not Chi-square distributed. A Chi-square difference test using scaled Chi-squares needs to be adapted before the Chi-square difference can be interpreted in the usual way. Specifically, the adjustment goes as follows. First we calculate a scaling correction factor:
$$s= (d_0c_0-d_1c_1)/(d_0-c_1)$$
where $d_0$ is the degrees of freedom of the nested (constrained) model and $d_1$ in the unconstrained model. Furthermore $c_1$ and $c_0$ are the scaling correction factors reported by lavaan
or other SEM packages like Mplus. Subsequently, we calculate a corrected Chi-sqaure difference
$$ \Delta_{\chi} = (T_0c_0 - T_1c_1)/ s $$
where $T_0$ and $T_1$ are the scaled (robust) model Chi-squares. This adjusted Chi-square is then tested on a central Chi-square distribution with degrees of freedom equal to the difference in degrees of freedom of the two models.
To provide an example for your data for testing configural against metric invariance in R
, we use a short script:
d0 = 488 # Enter data as in your output
d1 = 444
c0 = 1.186
c1 = 1.105
T0 = 861.367
T1=890.242
(cd = (d0 * c0 - d1*c1)/(d0 - d1)) # scaling correction factor
[1] 2.003364
(TRd = (T0*c0 - T1*c1)/cd) # Adjusted difference in model Chi-squares
[1] 18.90014
> (df = d0-d1) # Difference in degrees of freedom
[1] 44
> 1 - pchisq(TRd,df) # p-value
[1] 0.9996636
We can see that the scaled Chi-square difference is 18.9 (and now it has a positive sign!), which when tested with $\alpha=.05$ type-1 error probability is not significant. Hence there is evidence for metric invariance in your data.
There is a lot of documentation on this problem on the Mplus website. See here for a discussion of difference testing with scaled Chi-square. The correction I suggest is the simple adjustment variant which in some cases may still yield negative Chi-square. There is a more recent and more sophisticated approach called the strictly positive Chi-square difference. It is described on the Mplus website I linked.
Decrease in fit indices (RMSEA and CFI):
It was remarked that my answer did not yet sufficiently address the RMSEA and CFI increase that was observed over increasingly constrained versions of a baseline model. To understand this we first of all need to refer to the definitions of the two statistics:
$$RMSEA = \frac{(\chi ^2-df)^{\frac{1}{2}}}{df(n-1)}$$
and
$$CFI = \frac{ (\chi_0^2 - df_0) - (\chi_1^2 - df_1) }{ (\chi_0^2 - df_0)}$$
where $0$ and $1$ indicate the null model and the tested model respectively. It can be seen that both fit measures depend on $\chi^2$ and the $df$ of the model. The scaled $\chi^2$ is designed in a way to be more 'robust' to many practical problems, in particular the violation of multi-variate noramlity in continuous factor analysis. If we assume scaled $\chi^2$ is a more valid version than unscaled $\chi^2$, we may conclude that also ´scaled rmsea´ and ´scaled cfi´ are more precise versions. In lavaan
you therefore need to check that you looked at the correct scaled rmsea
and scaled cfi
.
Assuming that you did this already, it can be seen from the definitions of the two indices that a decrease in RMSEA and CFI across more constraint versions of the model is actually possible, in fact it is desirable!
To see this, we first of all assume that the chi-square of the constrained and unconstrained models does not change. This means that the more strict model is true. However the number of parameters in the model decreases, thus $df$'s increase. Now let $a$ denote the unconstrained (e.g. configural) and $b$ the constrained (e.g. metric) model. So we know that $df_a<df_b$ while assuming $\chi^2_a =\chi^2_b = \chi^2$ (i.e. no decrease in fit / more constrained model is true). Now we wonder if it is possible whether
$$RMSEA_a > RMSEA_b$$
as well as
$$CFI_a > CFI_b$$
It is particularly easy to see this for $CFI$, because there we have
$$CFI_a > CFI_b \Leftrightarrow (\chi_a^2 - df_a) - (\chi_b^2 - df_b) > 0 \\
\Leftrightarrow (\chi^2 - df_a) - (\chi^2 - df_b) > 0 \\
\Leftrightarrow df_b > df_a $$
which is always true if $\chi^2_a =\chi^2_b = \chi^2$. Hence $CFI$ of the more constrained model can be smaller than that of the unconstrained model and necessarily is when fit of the two models is exctly equal. For RMSEA the situation is a little bit more complicated because the inequality involves squared terms of $\chi^2$, $df_a$ and $df_b$. This suggests that the solution under the assumption $\chi^2_a = \chi^2_b$ depends on their particular values, but under certain combinations the inequality will hold as well.
Hence in conclusion, what you observe is possible. In particular we are more likely to find it in situations when the model $\chi^2$ only marginally changes while the amoung of additionally constrained parameters is large. This is exectly the result we get when a more constrained model is the true model and the less constrained model was specified too 'flexible' (over-parametrized). Thus decrease in the two fit measures is even better news than a (small) increase!
Best Answer