Comparative Analysis – Structural Equation Models vs OLS Regressions

least squaresmethod-comparisonstructural-equation-modeling

Are results obtained from a structural equation model (SEM) comparable to those from a series of separate regressions?

Or, conversely, can I obtain direct, indirect, and total effect of an independent variable of interest by running three separate regressions (e.g. OLSs)?

Let us say that IV=independent variable, MV = mediation variable, and DV = dependent variable. I conduct three separate regressions.

  1. DV = B0 + B1*MV + B2*IV + Controls
    B1 is the direct effect of MV on DV, while B2 is the direct effect of IV on DV.

  2. MV = K0 + K1*IV + Controls
    K1 is the direct effect of IV on MV.

  3. DV = P0 + P1*IV + Controls
    P1 is the indirect effect of IV on DV.

The notation used in the mediation analysis below is based on that used in this handbook:
Hayes, A. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press, New York, US.

  • K1 = a is the direct effect of IV on MV
  • B1 = b is the direct effect of MV on DV
  • B2 = c' is the direct effect of IV on DV
  • B2+B1*K1 = c'+a*b is the total effect of IV on DV

In turn, the total effect, that it, c'+a*b, should equal the coefficient E1 from the following regression:
DV = E0 + E1*IV + Controls

To my understanding, a*b is a measure of omitted variable bias. Thus, the entire SEM could be seen as an application of the Frisch-Waugh-Lovell theorem (Greene, 2003; p. 148-149).

However, when I compare the results obtained from this series of OLSs and the results from an ad-hoc package (i.e. I use the built-in GSEM in Stata), that results do not compare. In particular, while "a" and "b" are similar, "c'" is very different.

Here is a related post on Statalist, where I discuss the script. While the question is more focused on the script per se in Statalist, here I focus more on the methodology.

[I am new to SEM]

Best Answer

Are results obtained from a structural equation model (SEM) comparable to those from a series of separate regressions?

Yes, unless the SEM places constraints that are not in place among the separate regressions. Such constraints (especially when invalid) can systematically bias the estimated effects of interest. In SEM software, the optimizer tries to find a set of parameters that reproduces the whole covariance matrix (and mean vector), so "propogation of errors" is a problem. Some estimators are more robust to this (e.g., SAM or MIIV-SEM)

https://osf.io/pekbm/ (SAM preprint)

Bollen, K. A. (2019). Model implied instrumental variables (MIIVs): An alternative orientation to structural equation modeling. Multivariate Behavioral Research, 54(1), 31-46. https://doi.org/10.1080/00273171.2018.1483224

Or, conversely, can I obtain direct, indirect, and total effect of an independent variable of interest by running three separate regressions (e.g. OLSs)?

For OLS, yes (again, assuming the SEM is unconstrained: df = 0). This does not generally hold for generalized linear models:

Breen, R., Karlson, K. B., & Holm, A. (2013). Total, direct, and indirect effects in logit and probit models. Sociological Methods & Research, 42(2), 164-191. https://doi.org/10.1177/0049124113494572