Correlate Residuals in SEM – What Does it Mean?

residualsstructural-equation-modeling

I have been reading a paper by Cole, Ciesla, and Steiger, which argues in many cases allowing residuals to correlate is justified. However, I am not entirely sure what it means for residuals to correlate. A good definition and example would be much appreciated.

Cole, D. A., Ciesla, J. A., & Steiger, J. H. (2007). The insidious effects of failing to include design-driven correlated residuals in latent-variable covariance structure analysis. Psychological Methods, 12, 381–398.

Best Answer

It means that the unexplained variance from two variables are correlated. One way of thinking of this is as a partial correlation.

Say we have two regression equations:

\begin{equation} Y_1i=\beta1_1 \ X_i+\epsilon1_i \end{equation} \begin{equation} Y_2i=\beta2_1 \ X_i+\epsilon2_i \end{equation}

Both equations have an $\epsilon$ term. If you model that as two equations, that's fine. But what if you model it as one equation - do you want to assume that the $\epsilon$ terms are uncorrelated? If you do, then don't correlate them - as in, don't put estimate a correlation in the residual. Usually you don't, so you'd correlate the residuals.

An example: Say you want to look at the effect of age (in adults) on: speed at running 100m, speed at running 5 miles. I'd expect a negative relationship for both of these, but if you modeled them in one equation, you'd expect unexplained variance in 100m running speed to be correlated with 5 mile running speed, controlling for age - so the residuals are correlated.

You can also think of this in terms of latent variables - there are common causes of the residual for both 100m and 5 mile speeds, and hence you can hypothesize the existence of a latent (unmeasured) variable.