Measurement Invariance Testing with Robust Estimators – Why Improved Model Fit Indexes Occur

confirmatory-factorstructural-equation-modeling

I'm in the process of fitting measurement invariance models using the semTools package for R. For the basic measurement model–not distinguishing between groups–I used a robust ML estimator (MLR), and everything seemed fine. But I noticed that when I used this robust estimator (or an alternative robust estimator like MLM) when testing measurement invariance, my models yielded fit indexes that seemed highly implausible, if not impossible. Specifically, the CFI and the RMSEA both improve after constraining loadings and intercepts to equality, rather than worsening (as they ought to). When I re-run these models with a traditional ML estimator, however, this problem disappears (but then I do not get the benefits of correcting my test statistic for non-normality in my indicators).

Any guidance or ideas about what may be going on here would be seriously appreciated.

Update 1

Okay, as requested by Jeremy Miles, here's an update with some model code and output, in order to better diagnose the issue:

Update 2

Posted some factor loadings and latent covariances from both groups, to help diagnose why the scaling factor is changing so much between models.

Model Syntax

library(semTools)
library (lavaan)

RNS.invar<-"
manage =~ RNS_3 + RNS_8 + RNS_6 + RNS_19 + RNS_13 + RNS_4 + RNS_12 + RNS_15 + RNS_1 + RNS_22 + RNS_18 + RNS_5 + RNS_24
agree =~ RNS_8 + RNS_19 + RNS_13 + RNS_12 + RNS_15 + RNS_1 + RNS_18 + RNS_17 + RNS_11 + RNS_14 + RNS_7 + RNS_9 + RNS_16 + RNS_5 + RNS_2
explicit =~ RNS_3 + RNS_20 + RNS_4 + RNS_1 + RNS_22 + RNS_18 + RNS_17 + RNS_9 + RNS_16 + RNS_10 + RNS_5 + RNS_2
punish =~ RNS_12 + RNS_22 + RNS_9 + RNS_5 + RNS_21 + RNS_20 + RNS_23 + RNS_24
"

invariance=measurementInvariance(RNS.invar, data=dat, group="Male", estimator="MLR")

Output: invariance$fit.configural

Model Fit Indexes

Estimator                                         ML      Robust
Minimum Function Test Statistic              983.743     890.242
Degrees of freedom                               444         444
P-value (Chi-square)                           0.000       0.000
Scaling correction factor                                  1.105
for the Yuan-Bentler correction
Comparative Fit Index (CFI)                    0.858       0.846
RMSEA                                          0.100       0.091

Factor Loadings

Group 1 [1]:

Latent Variables:
               Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
manage =~                                                             
RNS_3             1.000                               1.428    0.865
RNS_8             0.813    0.088    9.201    0.000    1.161    0.673
RNS_6             0.988    0.088   11.189    0.000    1.411    0.811
RNS_19            0.889    0.116    7.680    0.000    1.270    0.694

Group 2 [2]:

Latent Variables:
               Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
manage =~                                                             
RNS_3             1.000                               1.549    0.907
RNS_8             0.798    0.091    8.741    0.000    1.237    0.823
RNS_6             0.982    0.102    9.617    0.000    1.521    0.861
RNS_19            0.728    0.121    6.002    0.000    1.127    0.695

Latent Covariances

Group 1 [1]:
Covariances:
               Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
manage ~~                                                             
agree             0.190    0.098    1.948    0.051    0.382    0.382
explicit          0.004    0.020    0.183    0.855    0.174    0.174
punish            0.056    0.051    1.095    0.274    0.324    0.324
agree ~~                                                              
explicit         -0.002    0.012   -0.177    0.859   -0.400   -0.400
punish            0.001    0.005    0.171    0.865    0.021    0.021
explicit ~~                                                           
punish           -0.000    0.001   -0.152    0.879   -0.085   -0.085

Group 2 [2]:
Covariances:
               Estimate  Std.Err  Z-value  P(>|z|)   Std.lv  Std.all
manage ~~                                                             
agree             0.063    0.066    0.963    0.336    0.137    0.137
explicit          0.039    0.167    0.232    0.817    0.877    0.877
punish            0.015    0.257    0.059    0.953    0.415    0.415
agree ~~                                                              
explicit          0.000    0.003    0.184    0.854    0.057    0.057
punish           -0.001    0.023   -0.059    0.953   -0.195   -0.195
explicit ~~                                                           
punish            0.000    0.006    0.057    0.955    0.519    0.519

Output: invariance$fit.loadings

Model Fit Indexes

Estimator                                         ML      Robust
Minimum Function Test Statistic             1021.659     861.367
Degrees of freedom                               488         488
P-value (Chi-square)                           0.000       0.000
Scaling correction factor                                  1.186
for the Yuan-Bentler correction
Comparative Fit Index (CFI)                    0.859       0.871
RMSEA                                          0.095       0.079

Factor Loadings

Group 1 [1]:
Latent Variables:
               Estimate  Std.Err  Z-value    P(>|z|)   Std.lv  Std.all
manage =~                                                               
RNS_3             1.000                                 1.462    0.873
RNS_8   (.p2.)    0.791    0.054     14.673    0.000    1.156    0.678
RNS_6   (.p3.)    0.981    0.053     18.469    0.000    1.435    0.817
RNS_19  (.p4.)    0.777    0.079      9.807    0.000    1.136    0.644

Group 2 [2]:
Latent Variables:
               Estimate  Std.Err  Z-value    P(>|z|)   Std.lv  Std.all
manage =~                                                               
RNS_3             1.000                                 1.536    0.913
RNS_8   (.p2.)    0.791    0.054     14.673    0.000    1.215    0.800
RNS_6   (.p3.)    0.981    0.053     18.469    0.000    1.508    0.863
RNS_19  (.p4.)    0.777    0.079      9.807    0.000    1.194    0.698

Latent Covariances

Group 1 [1]:
Covariances:
               Estimate  Std.Err  Z-value    P(>|z|)   Std.lv  Std.all
manage ~~                                                               
agree             0.177    0.049      3.592    0.000    0.392    0.392
explicit          0.001    0.001      1.683    0.092    0.164    0.164
punish            0.005    0.002      2.877    0.004    0.304    0.304
agree ~~                                                                
explicit         -0.001    0.000     -3.217    0.001   -0.389   -0.389
punish            0.000    0.000      0.000    1.000    0.000    0.000
explicit ~~                                                             
punish           -0.000    0.000     -0.479    0.632   -0.061   -0.061

Group 2 [2]:
Covariances:
               Estimate  Std.Err  Z-value    P(>|z|)   Std.lv  Std.all
manage ~~                                                               
agree             0.113    0.063      1.798    0.072    0.221    0.221
explicit          0.004    0.001      2.745    0.006    0.335    0.335
punish            0.010    0.002      3.948    0.000    0.455    0.455
agree ~~                                                                
explicit         -0.001    0.000     -4.246    0.000   -0.527   -0.527
punish           -0.000    0.001     -0.476    0.634   -0.060   -0.060
explicit ~~                                                             
punish            0.000    0.000      1.130    0.259    0.149    0.149

Output: invariance$fit.intercepts

Model Fit Indexes

Estimator                                         ML      Robust
Minimum Function Test Statistic             1049.546     754.119
Degrees of freedom                               508         508
P-value (Chi-square)                           0.000       0.000
Scaling correction factor                                  1.392
for the Yuan-Bentler correction
Comparative Fit Index (CFI)                    0.857       0.915
RMSEA                                          0.094       0.063

Factor Loadings

Group 1 [1]:
Latent Variables:
               Estimate  Std.Err  Z-value       P(>|z|)   Std.lv  Std.all
manage =~                                                                  
RNS_3             1.000                                    1.460    0.872
RNS_8   (.p2.)    0.802    0.047        17.113    0.000    1.171    0.678
RNS_6   (.p3.)    0.986    0.044        22.531    0.000    1.440    0.817
RNS_19  (.p4.)    0.786    0.054        14.597    0.000    1.148    0.648

Group 2 [2]:
Latent Variables:
               Estimate  Std.Err  Z-value       P(>|z|)   Std.lv  Std.all
manage =~                                                                  
RNS_3             1.000                                    1.532    0.913
RNS_8   (.p2.)    0.802    0.047        17.157    0.000    1.229    0.800
RNS_6   (.p3.)    0.986    0.044        22.278    0.000    1.512    0.863
RNS_19  (.p4.)    0.786    0.054        14.504    0.000    1.205    0.700

Latent Covariances

Group 1 [1]:
Covariances:
               Estimate  Std.Err  Z-value       P(>|z|)   Std.lv  Std.all
manage ~~                                                                  
agree             0.175    0.044         3.964    0.000    0.391    0.391
explicit          0.001    0.001         1.711    0.087    0.167    0.167
punish            0.004    0.001         3.116    0.002    0.306    0.306
agree ~~                                                                   
explicit         -0.001    0.000        -5.071    0.000   -0.391   -0.391
punish            0.000    0.000         0.003    0.997    0.000    0.000
explicit ~~                                                                
punish           -0.000    0.000        -0.338    0.735   -0.059   -0.059

Group 2 [2]:
Covariances:
               Estimate  Std.Err  Z-value       P(>|z|)   Std.lv  Std.all
manage ~~                                                                  
agree             0.111    0.054         2.045    0.041    0.220    0.220
explicit          0.003    0.001         3.713    0.000    0.340    0.340
punish            0.007    0.001         5.103    0.000    0.454    0.454
agree ~~                                                                   
explicit         -0.001    0.000        -6.576    0.000   -0.530   -0.530
punish           -0.000    0.000        -0.520    0.603   -0.062   -0.062
explicit ~~                                                                
punish            0.000    0.000         1.139    0.255    0.153    0.153

Summary

As I would expect, the $\chi^2$ statistic increases with each added level of invariance, when estimated using the boring ol' ML estimator. But the robust estimates of the $\chi^2$ statistic are decreasing with each added level of invariance constraints, when estimated using the MLR estimator…

Best Answer

The source of your problem is the 'robust' estimation of standard errors using the robust Satorra-Bentler Chi-square statistic. When testing for measurement invariance, we compare less constrained (configural invariance) to more constrained (metric or scalar invariance) models. The comparison that is usually applied is a Chi-square difference test, which compares the Chi-square of a less constrained to a more constrained model, testing the null hypothesis that both models have the same fit.

In addition some authors argue that one may also look at the change in RMSEA or CFI, but there are no strong advices on which change in these statistics is desired. Therefore my advice is to first of all look at change in model Chi-square and the associated p-value for above mentioned null hypothesis. I will therefore first answer your question in terms of Chi-square change and then address the change in CFI and RMSEA

Testing the change in model Chi-square

MLR uses a scaled version of Chi-square to find robust standard errors following a paper by Satorra and Bentler in Psychometrica. The problem you are facing now is that, as you say, the (scaled) Chi-squares decrease across more constrained version of the model. In fact, the simple scaled Chi-square differences between your models is negative and thus undefined.

This behavior can be expected because the difference in scaled Chi-squares is not Chi-square distributed. A Chi-square difference test using scaled Chi-squares needs to be adapted before the Chi-square difference can be interpreted in the usual way. Specifically, the adjustment goes as follows. First we calculate a scaling correction factor:

$$s= (d_0c_0-d_1c_1)/(d_0-c_1)$$

where $d_0$ is the degrees of freedom of the nested (constrained) model and $d_1$ in the unconstrained model. Furthermore $c_1$ and $c_0$ are the scaling correction factors reported by lavaan or other SEM packages like Mplus. Subsequently, we calculate a corrected Chi-sqaure difference

$$ \Delta_{\chi} = (T_0c_0 - T_1c_1)/ s $$

where $T_0$ and $T_1$ are the scaled (robust) model Chi-squares. This adjusted Chi-square is then tested on a central Chi-square distribution with degrees of freedom equal to the difference in degrees of freedom of the two models.

To provide an example for your data for testing configural against metric invariance in R, we use a short script:

d0 = 488 # Enter data as in your output
d1 = 444
c0 = 1.186
c1 = 1.105
T0 = 861.367
T1=890.242

(cd  = (d0 * c0 - d1*c1)/(d0 - d1)) # scaling correction factor
[1] 2.003364

(TRd = (T0*c0 - T1*c1)/cd)          # Adjusted difference in model Chi-squares
[1] 18.90014

> (df  = d0-d1)                     # Difference in degrees of freedom
[1] 44

> 1 - pchisq(TRd,df)                # p-value
[1] 0.9996636

We can see that the scaled Chi-square difference is 18.9 (and now it has a positive sign!), which when tested with $\alpha=.05$ type-1 error probability is not significant. Hence there is evidence for metric invariance in your data.

There is a lot of documentation on this problem on the Mplus website. See here for a discussion of difference testing with scaled Chi-square. The correction I suggest is the simple adjustment variant which in some cases may still yield negative Chi-square. There is a more recent and more sophisticated approach called the strictly positive Chi-square difference. It is described on the Mplus website I linked.

Decrease in fit indices (RMSEA and CFI):

It was remarked that my answer did not yet sufficiently address the RMSEA and CFI increase that was observed over increasingly constrained versions of a baseline model. To understand this we first of all need to refer to the definitions of the two statistics:

$$RMSEA = \frac{(\chi ^2-df)^{\frac{1}{2}}}{df(n-1)}$$

and

$$CFI = \frac{ (\chi_0^2 - df_0) - (\chi_1^2 - df_1) }{ (\chi_0^2 - df_0)}$$

where $0$ and $1$ indicate the null model and the tested model respectively. It can be seen that both fit measures depend on $\chi^2$ and the $df$ of the model. The scaled $\chi^2$ is designed in a way to be more 'robust' to many practical problems, in particular the violation of multi-variate noramlity in continuous factor analysis. If we assume scaled $\chi^2$ is a more valid version than unscaled $\chi^2$, we may conclude that also ´scaled rmsea´ and ´scaled cfi´ are more precise versions. In lavaan you therefore need to check that you looked at the correct scaled rmsea and scaled cfi.

Assuming that you did this already, it can be seen from the definitions of the two indices that a decrease in RMSEA and CFI across more constraint versions of the model is actually possible, in fact it is desirable!

To see this, we first of all assume that the chi-square of the constrained and unconstrained models does not change. This means that the more strict model is true. However the number of parameters in the model decreases, thus $df$'s increase. Now let $a$ denote the unconstrained (e.g. configural) and $b$ the constrained (e.g. metric) model. So we know that $df_a<df_b$ while assuming $\chi^2_a =\chi^2_b = \chi^2$ (i.e. no decrease in fit / more constrained model is true). Now we wonder if it is possible whether

$$RMSEA_a > RMSEA_b$$

as well as

$$CFI_a > CFI_b$$

It is particularly easy to see this for $CFI$, because there we have

$$CFI_a > CFI_b \Leftrightarrow (\chi_a^2 - df_a) - (\chi_b^2 - df_b) > 0 \\ \Leftrightarrow (\chi^2 - df_a) - (\chi^2 - df_b) > 0 \\ \Leftrightarrow df_b > df_a $$

which is always true if $\chi^2_a =\chi^2_b = \chi^2$. Hence $CFI$ of the more constrained model can be smaller than that of the unconstrained model and necessarily is when fit of the two models is exctly equal. For RMSEA the situation is a little bit more complicated because the inequality involves squared terms of $\chi^2$, $df_a$ and $df_b$. This suggests that the solution under the assumption $\chi^2_a = \chi^2_b$ depends on their particular values, but under certain combinations the inequality will hold as well.

Hence in conclusion, what you observe is possible. In particular we are more likely to find it in situations when the model $\chi^2$ only marginally changes while the amoung of additionally constrained parameters is large. This is exectly the result we get when a more constrained model is the true model and the less constrained model was specified too 'flexible' (over-parametrized). Thus decrease in the two fit measures is even better news than a (small) increase!

Best Answer

Testing the change in model Chi-square

Decrease in fit indices (RMSEA and CFI):

Related Solutions

Solved – Testing for validity of a formative measurement model

Solved – In testing a structural model fit, how strongly do the measurement models fit influence the overall structural model fit

Related Question