I understand that scalar invariance, in the context of Structural Equations Modeling (SEM), is having the intercepts for observed variables loading on the same latent variable be invariant across multiple groups. However – what does scalar invariance mean substantially? What are its implications?
Solved – the substantive meaning of scalar invariance
measurementstructural-equation-modeling
Related Solutions
The intercept or mean of a latent variable is arbitrary, like the variance, and is usually fixed to zero if you have a single group model (or a single time point model). The intercept of the measured variable is the expected value when the predictor (the latent variable) is equal to zero.
You anchor the mean of the latent variable to the intercept of the measured variables, and that means that you can compare them over time. But if the intercepts of the measured variables drift apart, you can't anchor the means to them any more, because you don't know where they are anchored.
Enough analogies, let's have a concrete example.
Let's say you want to compare depression symptoms in men and women.
So you ask three questions: How many days in the past week have you:
- Felt lonely.
- Felt sad
- Cried
I create a latent variable based on this, and error and loadings look good. Now I want to compare the means of the latent variables, so I fix the male latent mean to zero. I constrain the intercepts of the three measured variables to be equal across groups.
Women and men do not differ on how much they have felt lonely, how much they have felt sad, but then we find that women say that they have cried more than men.
Does that mean that the women have 'more' depression than the men? If we anchor to crying - yes. If we anchor to the other two variables - no. We don't have intercept invariance, and because of that, we can't compare the means of the latent variables.
Another (only slightly different) way to think about it. The intercept of the measured variable is the expected value of the variable if the mean of the factor is equal to zero. The predicted values for the measured variables should be the same between men and women when the values of the factors are equal (that is, when the value of the factors is zero). But the predicted values of the measured variables are not equal when the factors are equal. Some are equal (in our example, 1 and 2), one is not (3).
Multi-group SEM Interpretation – How to Interpret Group Differences with Weak Measurement Invariance
Rather than discretizing age into arbitrary categories (which can have negative consequences; MacCallum et al., 2002), you could keep using age as a predictor of the latent factors. This is called a multiple-indicator multiple-cause (MIMIC) model, and it can be used to test invariance as an alternative to multigroup CFA (see Kolbe et al., 2019, 2021, for reviews of single-group approaches for invariance tests).
Regarding your questions:
- In the MIMIC model, you can test for differential item/indicator functioning (DIF) by additionally regressing an indicator on age. If there is a significant direct effect of age on an indicator, then the indicator mean is a function of age even among subjects with the same factor score (standard interpretation of a partial regression slope). If not, then the relationship between age and an indicator is fully mediated by the factor (Montoya & Jeon, 2020).
- Comparison of regression slopes only requires metric ("weak") invariance.
If you use the MIMIC approach, product indicators can be used to define latent interactions with age (Kolbe & Jorgensen, 2017, as a lavaan
example).
Moderated nonlinear factor analysis (MNLFA) is a more intuitive option (Bauer et al., 2017, 2020), but that is not an option for lavaan
. It should be possible using OpenMx
.
References:
- Bauer, D. J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 22(3), 507–526. https://doi.org/10.1037/met0000077
- Bauer, D. J., Belzak, W. C., & Cole, V. T. (2020). Simplifying the assessment of measurement invariance over multiple background variables: Using regularized moderated nonlinear factor analysis to detect differential item functioning. Structural Equation Modeling, 27(1), 43-55. https://doi.org/10.1080/10705511.2019.1642754
- Kolbe, L., & Jorgensen, T. D. (2017). Using product indicators in restricted factor analysis models to detect nonuniform measurement bias. In Quantitative Psychology (pp. 235-245). Springer. https://doi.org/10.1007/978-3-319-77249-3_20
- Kolbe, L., & Jorgensen, T. D. (2019). Using restricted factor analysis to select anchor items and detect differential item functioning. Behavior Research Methods, 51(1), 138-151. https://doi.org/10.3758/s13428-018-1151-3
- Kolbe, L., Jorgensen, T. D., & Molenaar, D. (2021). The impact of unmodeled heteroskedasticity on assessing measurement invariance in single-group models. Structural Equation Modeling, 28(1), 82-98. https://doi.org/10.1080/10705511.2020.1766357
- Montoya, A. K., & Jeon, M. (2020). MIMIC Models for Uniform and Nonuniform DIF as Moderated Mediation Models. Applied Psychological Measurement, 44(2), 118-136. https://doi.org/10.1177/0146621619835496
Best Answer
It means that for the same score on the latent variable, people in the two groups do not have different intercepts on the observed variables.
Say you're comparing two racial/ethnic groups on a measure of ability that's used in job selection. You find that you don't have scalar invariance for one item. That means that one group finds one question easier than the other group. That means that if you take the total score, you're going to get a biased score. There's an example of that here: https://www.talentqgroup.com/media/84831/policy_assessment_and_the_law-march-2013-.pdf (look at the British Rail example).
Second example: You want to measure depression, so you ask about crying. Women cry more than men, whether they're depressed or not. Women are therefore going to get higher scores on the measure of depression, even if they're equally depressed.