Structural Equation Modeling – Comparing Latent Covariances in Multi-Group SEM in R: Fix Factor Variances or Factor Loadings?

latent-variablelavaanrstructural-equation-modeling

I am using Structural Equation Modeling (SEM) in R with the lavaan package. I would like to compare four groups of children regarding how strongly their processing speed (PS) and their working memory (WM) correlate with their reasoning skills (Gf).

For this, I need to compare the latent factor covariances (PS ~~ Gf, WM ~~Gf and WM ~~PS) between the four groups to test if they differ significantly. Before comparing the covariances, I tested for metric invariance to see if the groups are comparable. After establishing metric invariance, I was going to put equality constraints on the factor covariances and compare the resulting model with the model for metric invariance to see if setting the latent covariances to equality worsens model fit considerably.

But during the process of testing for metric invariance, I was puzzled: Depending on the way I get identification for my model – either by fixing each factor's variance to 1 (UVI) or by fixing the loading of one item per factor to 1 (ULI) – I either get metric invariance, or I don't.

I found some answers about this phenomenon here and here, both of which state that the choice of identification/scaling depends on the type of question one wants to answer. Both say that if one wants to compare factor loadings among groups, UVI makes more sense. But in my case, I'd like to compare latent factor covariances. I'd like to make statements such as "In group 1, the correlation between latent factor PS and Gf was significantly smaller than in group 2".

So my questions are…

What scaling/identification method (fixing the factor variances vs. fixing one factor loading per factor) should I choose if I want to compare latent factor covariances, ideally their standardized version, among four groups?
Is it correct that I have to establish metric invariance (equality of factor loadings across groups) or can I just compare a completely free multigroup model to one with cross-group equality constraints on the factor covariances?

Thank you so much!!!

Best Answer

To compare factor correlations, I would use identify the configural model by fixing factor variances to 1. Upon constraining loadings to equality, you need to free latent variances in all but 1 reference group. After establishing (at least partial) metric invariance, you can test whether latent variances are equal across groups by fixing them all to 1 again (comparing fit of latent-homoskedasticity model to fit of metric-invariance model). If you can't reject the $H_0$ of latent homoskedasticity, then you can proceed to directly compare latent covariances across groups (which will be correlations because all variances = 1). If not, then you can define "phantom constructs": a higher-order factor has variance = 1, freely estimated loading (interpreted as the lower-order factor's $SD$) in all but the reference group (leave loading fixed to 1 for identification), and fix all lower-order factor (co)variances to 0. Then the factor covariances are estimated only among higher-order "phantom" constructs, so they will be correlations.
Yes, latent covariance-structure parameters are only comparable given (at least partial) metric invariance

Testing the change in model Chi-square

MLR uses a scaled version of Chi-square to find robust standard errors following a paper by Satorra and Bentler in Psychometrica. The problem you are facing now is that, as you say, the (scaled) Chi-squares decrease across more constrained version of the model. In fact, the simple scaled Chi-square differences between your models is negative and thus undefined.

This behavior can be expected because the difference in scaled Chi-squares is not Chi-square distributed. A Chi-square difference test using scaled Chi-squares needs to be adapted before the Chi-square difference can be interpreted in the usual way. Specifically, the adjustment goes as follows. First we calculate a scaling correction factor:

$$s= (d_0c_0-d_1c_1)/(d_0-c_1)$$

where $d_0$ is the degrees of freedom of the nested (constrained) model and $d_1$ in the unconstrained model. Furthermore $c_1$ and $c_0$ are the scaling correction factors reported by lavaan or other SEM packages like Mplus. Subsequently, we calculate a corrected Chi-sqaure difference

$$ \Delta_{\chi} = (T_0c_0 - T_1c_1)/ s $$

where $T_0$ and $T_1$ are the scaled (robust) model Chi-squares. This adjusted Chi-square is then tested on a central Chi-square distribution with degrees of freedom equal to the difference in degrees of freedom of the two models.

To provide an example for your data for testing configural against metric invariance in R, we use a short script:

d0 = 488 # Enter data as in your output
d1 = 444
c0 = 1.186
c1 = 1.105
T0 = 861.367
T1=890.242

(cd  = (d0 * c0 - d1*c1)/(d0 - d1)) # scaling correction factor
[1] 2.003364

(TRd = (T0*c0 - T1*c1)/cd)          # Adjusted difference in model Chi-squares
[1] 18.90014

> (df  = d0-d1)                     # Difference in degrees of freedom
[1] 44

> 1 - pchisq(TRd,df)                # p-value
[1] 0.9996636

We can see that the scaled Chi-square difference is 18.9 (and now it has a positive sign!), which when tested with $\alpha=.05$ type-1 error probability is not significant. Hence there is evidence for metric invariance in your data.

There is a lot of documentation on this problem on the Mplus website. See here for a discussion of difference testing with scaled Chi-square. The correction I suggest is the simple adjustment variant which in some cases may still yield negative Chi-square. There is a more recent and more sophisticated approach called the strictly positive Chi-square difference. It is described on the Mplus website I linked.

Decrease in fit indices (RMSEA and CFI):

It was remarked that my answer did not yet sufficiently address the RMSEA and CFI increase that was observed over increasingly constrained versions of a baseline model. To understand this we first of all need to refer to the definitions of the two statistics:

$$RMSEA = \frac{(\chi ^2-df)^{\frac{1}{2}}}{df(n-1)}$$

and

$$CFI = \frac{ (\chi_0^2 - df_0) - (\chi_1^2 - df_1) }{ (\chi_0^2 - df_0)}$$

where $0$ and $1$ indicate the null model and the tested model respectively. It can be seen that both fit measures depend on $\chi^2$ and the $df$ of the model. The scaled $\chi^2$ is designed in a way to be more 'robust' to many practical problems, in particular the violation of multi-variate noramlity in continuous factor analysis. If we assume scaled $\chi^2$ is a more valid version than unscaled $\chi^2$, we may conclude that also ´scaled rmsea´ and ´scaled cfi´ are more precise versions. In lavaan you therefore need to check that you looked at the correct scaled rmsea and scaled cfi.

Assuming that you did this already, it can be seen from the definitions of the two indices that a decrease in RMSEA and CFI across more constraint versions of the model is actually possible, in fact it is desirable!

To see this, we first of all assume that the chi-square of the constrained and unconstrained models does not change. This means that the more strict model is true. However the number of parameters in the model decreases, thus $df$'s increase. Now let $a$ denote the unconstrained (e.g. configural) and $b$ the constrained (e.g. metric) model. So we know that $df_a<df_b$ while assuming $\chi^2_a =\chi^2_b = \chi^2$ (i.e. no decrease in fit / more constrained model is true). Now we wonder if it is possible whether

$$RMSEA_a > RMSEA_b$$

as well as

$$CFI_a > CFI_b$$

It is particularly easy to see this for $CFI$, because there we have

$$CFI_a > CFI_b \Leftrightarrow (\chi_a^2 - df_a) - (\chi_b^2 - df_b) > 0 \\ \Leftrightarrow (\chi^2 - df_a) - (\chi^2 - df_b) > 0 \\ \Leftrightarrow df_b > df_a $$

which is always true if $\chi^2_a =\chi^2_b = \chi^2$. Hence $CFI$ of the more constrained model can be smaller than that of the unconstrained model and necessarily is when fit of the two models is exctly equal. For RMSEA the situation is a little bit more complicated because the inequality involves squared terms of $\chi^2$, $df_a$ and $df_b$. This suggests that the solution under the assumption $\chi^2_a = \chi^2_b$ depends on their particular values, but under certain combinations the inequality will hold as well.

Hence in conclusion, what you observe is possible. In particular we are more likely to find it in situations when the model $\chi^2$ only marginally changes while the amoung of additionally constrained parameters is large. This is exectly the result we get when a more constrained model is the true model and the less constrained model was specified too 'flexible' (over-parametrized). Thus decrease in the two fit measures is even better news than a (small) increase!

CFA/SEM – Fixing Second-order Factor Loadings to Equal Across All Second-order Factors

The beginning of Steiger's (2002) paper explains this phenomenon. Actually, the issue of 2-indicator factors being empirically identified depends on how highly a 2-indicator factor is correlated with other factors in the same model. Larger correlations yield better empirical identification. Steiger lays out the math, showing that 2 loadings/factor are a function of correlations between indicators within and across factors. So as long as the across-factor correlations are sufficiently large, there is enough information in the data to estimate both loadings without an equality constraint.

Steiger, J. H. (2002). When constraints interact: A caution about reference variables, identification constraints, and scale dependencies in structural equation modeling. Psychological methods, 7(2), 210–227. https://doi.org/10.1037/1082-989X.7.2.210

Best Answer

Related Solutions

Measurement Invariance Testing with Robust Estimators – Why Improved Model Fit Indexes Occur

Testing the change in model Chi-square

Decrease in fit indices (RMSEA and CFI):

CFA/SEM – Fixing Second-order Factor Loadings to Equal Across All Second-order Factors

Related Question