Solved – Causality, omitted variable bias

causalityeconometricsmultiple regressionregression

This might be a basic question, but I want to be sure that what I'm doing is right. I have a model that suggests that variable X causes both Y and Z. When I regress Y on X, or Z on X, I get positive and significant coefficients as expected.

Now, when I regress Z on Y, I still get a positive significant coefficient.

Question 1: is this an omitted variable bias?

Question 2: is it legitimate to regress Z on Y and X to test whether the relationship between Z and Y is spurious?

Question 3: if it is legitimate and if I get positive significant coefficients on both Y and X what does that mean? Does it mean "X causes Y and Z, but Y still has marginal explanatory power on Z"?

Many thanks,
Dave

Best Answer

You need to distinguish the causal graph from the regression coefficients here. Something is only 'spurious' if it does not identify the causal effect of interest, and this depends on the graph structure you have assumed, not on any regression coefficients.

As an example (and restricting ourselves to causal DAG structures with no hidden variables) assume X causes Y and X causes Z. Then even if Z does not cause Y you will be able to regress the Y on Z and get a non-zero coefficient, so that doesn't tell you much. Conditioning on X in a regression of Y on Z is the right thing to do if you want to know what the causal effect of Z is on Y assuming that X causes both Y and Z and that Z causes Y rather than vice versa. If, on the other hand, Y causes Z, then despite there being no causal effect to estimate you will again get a non-zero regression coefficient.

It all depends on which variables are connected by causal arrows and which direction those arrows point. It's sometimes useful to simulate data with the relevant structure and run the regressions to get a feel for what can happen.

There are some situations where causal structure can be inferred from regressing things on other things and finding zero coefficients, but they are fairly limited. A nice overview can be found in chapter 25 of Shalizi's draft textbook (ch.21-24 are also worth reading). Leaving aside discovery, the basic theoretical framework can be found in compressed form in Pearl's review paper, and as a more leisurely exposition in the references here.

Unfortunately this means that the answer to each of your three questions is "it depends" (on the graph), but the references above should hopefully point you towards what you would have to assume to interpret things they way you're considering.

Related Question