Comparative Analysis – Structural Equation Models vs OLS Regressions

least squaresmethod-comparisonstructural-equation-modeling

Are results obtained from a structural equation model (SEM) comparable to those from a series of separate regressions?

Or, conversely, can I obtain direct, indirect, and total effect of an independent variable of interest by running three separate regressions (e.g. OLSs)?

Let us say that IV=independent variable, MV = mediation variable, and DV = dependent variable. I conduct three separate regressions.

DV = B0 + B1*MV + B2*IV + Controls
B1 is the direct effect of MV on DV, while B2 is the direct effect of IV on DV.
MV = K0 + K1*IV + Controls
K1 is the direct effect of IV on MV.
DV = P0 + P1*IV + Controls
P1 is the indirect effect of IV on DV.

The notation used in the mediation analysis below is based on that used in this handbook:
Hayes, A. (2013). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. Guilford Press, New York, US.

K1 = a is the direct effect of IV on MV
B1 = b is the direct effect of MV on DV
B2 = c' is the direct effect of IV on DV
B2+B1*K1 = c'+a*b is the total effect of IV on DV

In turn, the total effect, that it, c'+a*b, should equal the coefficient E1 from the following regression:
DV = E0 + E1*IV + Controls

To my understanding, a*b is a measure of omitted variable bias. Thus, the entire SEM could be seen as an application of the Frisch-Waugh-Lovell theorem (Greene, 2003; p. 148-149).

However, when I compare the results obtained from this series of OLSs and the results from an ad-hoc package (i.e. I use the built-in GSEM in Stata), that results do not compare. In particular, while "a" and "b" are similar, "c'" is very different.

Here is a related post on Statalist, where I discuss the script. While the question is more focused on the script per se in Statalist, here I focus more on the methodology.

[I am new to SEM]

Best Answer

Are results obtained from a structural equation model (SEM) comparable to those from a series of separate regressions?

Yes, unless the SEM places constraints that are not in place among the separate regressions. Such constraints (especially when invalid) can systematically bias the estimated effects of interest. In SEM software, the optimizer tries to find a set of parameters that reproduces the whole covariance matrix (and mean vector), so "propogation of errors" is a problem. Some estimators are more robust to this (e.g., SAM or MIIV-SEM)

https://osf.io/pekbm/ (SAM preprint)

Bollen, K. A. (2019). Model implied instrumental variables (MIIVs): An alternative orientation to structural equation modeling. Multivariate Behavioral Research, 54(1), 31-46. https://doi.org/10.1080/00273171.2018.1483224

Or, conversely, can I obtain direct, indirect, and total effect of an independent variable of interest by running three separate regressions (e.g. OLSs)?

For OLS, yes (again, assuming the SEM is unconstrained: df = 0). This does not generally hold for generalized linear models:

Breen, R., Karlson, K. B., & Holm, A. (2013). Total, direct, and indirect effects in logit and probit models. Sociological Methods & Research, 42(2), 164-191. https://doi.org/10.1177/0049124113494572

Best Answer

Related Solutions

Solved – How to interpret coefficients produced by the sem function in R

Solved – What test should I perform on a structural equation model

Related Question