Structural Equation Modeling – Estimating Direct and Total Effects with Regressions and SEM (lavaan)

lavaanstructural-equation-modeling

I am reading about the mediation test example from the lavaan package here:

http://lavaan.ugent.be/tutorial/mediation.html

Specifically, they fit the model:

set.seed(1234)
X <- rnorm(100)
M <- 0.5*X + rnorm(100)
Y <- 0.7*M + rnorm(100)
Data <- data.frame(X = X, Y = Y, M = M)
model <- ' # direct effect
             Y ~ c*X
           # mediator
             M ~ a*X
             Y ~ b*M
           # indirect effect (a*b)
             ab := a*b
           # total effect
             total := c + (a*b)
         '
fit <- sem(model, data = Data)
summary(fit)

which produces the following output:

Regressions:
                   Estimate  Std.Err  Z-value  P(>|z|)
  Y ~                                                 
    X          (c)    0.036    0.104    0.348    0.728
  M ~                                                 
    X          (a)    0.474    0.103    4.613    0.000
  Y ~                                                 
    M          (b)    0.788    0.092    8.539    0.000

Variances:
                   Estimate  Std.Err  Z-value  P(>|z|)
    Y                 0.898    0.127    7.071    0.000
    M                 1.054    0.149    7.071    0.000

Defined Parameters:
                   Estimate  Std.Err  Z-value  P(>|z|)
    ab                0.374    0.092    4.059    0.000
    total             0.410    0.125    3.287    0.001

The summary shows that the so called "direct effect", which is expressed as a bivariate regression of $Y$ on $X$, has an effect labeled "c", and the total effect is the sum of the direct effect and the indirect effect. I was under the impression that a regression of an outcome onto an exposure not adjusting for the mediator summarized the total effect, and if you included the mediator in such a model, the conditional effect of the exposure is the direct effect. Indeed this is the case if one fits linear regression models.

Running lm(Y~X, data=Data) gives:

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   0.1929     0.1274   1.513  0.13339   
X             0.4100     0.1260   3.254  0.00156 **

which is, in the lavaan example, what was called the "Direct" effect but which numerically equals the "total" effect. In order to get lavaan's "direct" effect, I fit the model lm(Y~X+M) and get:

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.16358    0.09747   1.678   0.0965 .  
X            0.03635    0.10605   0.343   0.7325    
M            0.78832    0.09373   8.410 3.58e-13 ***

Which is conditional on M.

Can someone explains how SEM works in this fashion to "sequentially condition" the regression models as specified?

Best Answer

Drawing back to the world of mediation as defined by Baron & Kenny (1986), you would typically test a mediation model using separate regression equations. According to this approach the researcher would seek to demonstrate a series of conditions, which if they held, would constitute evidence of mediation. The first condition is demonstrated by your first regression model lm(Y~X, data=Data). This shows that there is a significant overall effect of $X$ on $Y$ (side note: this is also the total effect).

The next condition to test is to show that $X$ is significantly related to $M$. This is missing from your modeling, but can be easily estimated based on the data you generated.

> summary(lm(M~X))

Call:
lm(formula = M ~ X)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.88626 -0.61401  0.00236  0.58645  2.98774 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.03715    0.10498   0.354    0.724    
X            0.47392    0.10378   4.567 1.44e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.037 on 98 degrees of freedom
Multiple R-squared:  0.1755,    Adjusted R-squared:  0.1671 
F-statistic: 20.85 on 1 and 98 DF,  p-value: 1.442e-05

Here we can see that indeed the second condition of mediation is met as $X$ significantly predicts $M$. Also note that you get the same unstandardized regression coefficient .474 in this model as you do in the SEM model above.

Following the Baron and Kenny approach the next two conditions are tested simultaneously in one final regression model. First, controlling for $X$, $M$ should significantly predict $Y$, which it does in your model above. Second, the relation between $X$ and $Y$ should be attenuated when controlling for $M$. This condition is also met according to your model above.

To calculate the significance of the mediation (i.e., the indirect effect), researchers tended to use Sobel's Test, at least when the full Baron and Kenny approach was still in vogue. To perform this test you needed to take the direct effect of $X$ on $M$ (from regression model #2) and multiply it by the direct effect of $M$ on $Y$, controlling for $X$ (to calculate significance you also needed the standard errors for these regression coefficients).

An SEM model tests all of these same conditions and can assess the significance of mediation in one step essentially. Note that all of the elements needed for a Sobel's test are in the single SEM model. Also note that all of the coefficients are essentially the same as those you calculated using the linear regression framework proposed originally by Baron and Kenny.

Perhaps the issue comes down to parlance? The total effect in a mediation model is sometimes delineated as $c$ (it is necessarily the unconditional effect of $X$ on $Y$). The direct effect of $X$ on $Y$ conditioned on $M$ is often notated as $c'$ where $c' + a*b = c$. These are terminology conventions used by researchers in reference to mediation models specifically and may not translate perfectly to all other uses of the terms "total effect" and "direct effect" in the world of statistics.

Related Question