Solved – multiple group model vs moderated regression

interactionpath-modelregressionstructural-equation-modeling

I have questions about whether a colleague's statistical approach is appropriate. They are looking at whether the effects of 9 continuous predictors on a continuous outcome differ between 3 natural / non-assigned groups. All variables are directly measured – no latent variables. They do this by examining multiple group models in Mplus as follows:

  1. Estimate a model in which the outcome is regressed on all 9 predictors, separately in group A and group B, but with all regression paths constrained to be equal between groups A and B. (Intercepts and variances not constrained.)
  2. Estimate the same model, but with the regression path for predictor 1 only freed to vary between groups.
  3. If chi-square difference between the two models is significant, conclude that the paths significantly differ.
  4. Repeat steps 2 and 3 for predictors 2 – 9.

  5. Repeat steps 1 through 4, but this time in groups A and C.

My basic question is whether this seems like an appropriate approach. More specifically:

  1. Is there a good paper or reference describing this method so I can understand it better? I have a very basic understanding of SEM (took a course a few years ago but never really used it) and understand the logic of their approach, but I don't have a strong enough understanding to know if they're really using it appropriately.
  2. Would outcomes from this approach differ radically from results of moderated regression? i.e., using dummy codes for group, computing interaction terms, and using simple slopes to follow up interactions? I feel like moderated regression would be a lot "cleaner," but I may be biased by familiarity. Is there any reason to prefer either their multiple groups modeling approach or more standard moderated regression?
  3. Is a significant chi-square difference test sufficient to conclude there's a meaningful difference between paths? Is it overly sensitive to sample size? Their groups are approx 150 to 225 people each.
  4. Do they need to demonstrate measurement invariance before testing differences between individual paths? I believe this is necessary when testing paths among latent variables (?), but maybe not if only manifest variables are included?

Best Answer

  1. I don't know of a paper. It's the sort of thing that's pretty clear if you know about SEM, and that's why no one writes it. If anyone did write it and send it to a journal, reviewers would say "This is obvious."

    1. The two methods can give identical results, if you add the correct constraints to the SEM. The SEM approach makes fewer assumptions about homogeneity of variance, so it might be preferable. To make the models equivalent you need to add constraints. (The variance of y can be different in the two groups in the SEM approach, it can't in the multilevel approach.

Here's an example (using Lavaan, in R). Everything in the regression can be seen in the lavaan output.

> library(lavaan)
> set.seed(1234)
> df <- data.frame(x = rnorm(1000))
> df$m <- c(rep(0, 500), rep(1, 500))
> df$y <- df$x + rnorm(1000) + df$m + df$m * df$x + rnorm(1000)
> 
> summary(lm(y ~ x + m + x * m, data = df))

Call:
lm(formula = y ~ x + m + x * m, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.4406 -0.9659  0.0093  0.9167  4.4616 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.08244    0.06193   1.331    0.183    
x            1.06196    0.05991  17.727   <2e-16 ***
m            0.92634    0.08766  10.568   <2e-16 ***
x:m          1.01805    0.08815  11.548   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.385 on 996 degrees of freedom
Multiple R-squared:  0.5902,    Adjusted R-squared:  0.5889 
F-statistic: 478.1 on 3 and 996 DF,  p-value: < 2.2e-16

> 
> mod.1 <- "
+   y ~ c(a, b) * x
+   y ~~ c(v1, v1) * y  # This step needed for exact equivalence
+   y ~ c(int1, int2) * 1
+ 
+   modEff := a - b
+   mEff := int1 - int2
+ "
> 
> fit.1 <- sem(mod.1, data = df,
+                 group = "m")
> summary(fit.1)
lavaan (0.5-18) converged normally after  15 iterations

  Number of observations per group         
  0                                                500
  1                                                500

  Estimator                                         ML
  Minimum Function Test Statistic                0.499
  Degrees of freedom                                 1
  P-value (Chi-square)                           0.480

Chi-square for each group:

  0                                              0.244
  1                                              0.255

Parameter estimates:

  Information                                 Expected
  Standard Errors                             Standard

Group 1 [0]:

                   Estimate  Std.err  Z-value  P(>|z|)
Regressions:
  y ~
    x         (a)     1.062    0.060   17.762    0.000

Intercepts:
    y      (int1)     0.082    0.062    1.334    0.182

Variances:
    y        (v1)     1.910    0.085



Group 2 [1]:

                   Estimate  Std.err  Z-value  P(>|z|)
Regressions:
  y ~
    x         (b)     2.080    0.065   32.228    0.000

Intercepts:
    y      (int2)     1.009    0.062   16.294    0.000

Variances:
    y        (v1)     1.910    0.085


Defined parameters:
    modEff           -1.018    0.088  -11.572    0.000
    mEff             -0.926    0.087  -10.589    0.000

(Actually I didn't do the final step of constraining the model and testing with the anova() function, I just used the difference - that's left as an exercise for the reader, but the result will be the same p-value and no parameter estimate and standard error.)

  1. Chi-square gives a p-value. If you don't like the p-value from the chi-square, you don't like the p-value from the regression model. It's the same estimate of the interaction effect, same standard error, same p-value, whichever method you use.

  2. Not really. But you can, and it's easy to relax the assumption. In regression you make the assumption and you're stuck with it. In SEM the assumption is automatically tested. I added the constraint to the above model, and so it has 1 df. The chi-square test is not significant, so I don't have evidence that the assumption was violated. But you don't need to put in that constraint, so I suspect most people wouldn't.

Related Question