Solved – Reverse causality, a bigger problem than I initially thought

causalitycorrelationeconometricsregression

Take a standard regression framework: $$ Y_{it} =\beta X_{it} + \epsilon_{it}$$
Assume for simplicity that no omitted variables exist, nor are there simultaneity or measurement problems.

In several economic disciplines I have seen literature where occasionally statements of the following nature are made:
The regression only implies correlation, but does not imply causation. We cannot rule out that Y causes X rather than X causing Y.

I am currently in a situation like this myself and I am wondering: Isn't this a far bigger problem than most of these authors make it out to be ?
If Y causes X and Y by very definition is correlated with its own error term, wouldn't X be endogeneous and the estimated coefficient biased ?
Thereby making it impossible to say anything about both correlation and causation, rather than just causation ?

Edit: The regression implies correlation, it does not show it. Thanks @ Tim

Best Answer

Assume that the true causal relation is

$$x_i = ay_i + u_i \tag{1}$$

with the $u$-vector independent of the $y_i$-vector, but we mispecify

$$y_i = bx_i + \epsilon_i \tag{2}$$

And we get the theoretical relationship (substituting $(1)$ in $(2)$ and applying expected values)

$$b = \frac 1a \tag{3}$$

Attempting an OLS estimation for $b$ we get

$$\hat b = \frac {\sum x_iy_i}{\sum x_i^2}$$

What does this estimate in reality?
We need to plug in eq.$(1)$ to find out (since this is the true causal relationship, by assumption, while $(2)$ is just a figment of our imagination). We get

$$\hat b = \frac {\sum (ay_i + u_i)y_i}{\sum (ay_i + u_i)^2} = \frac {a\sum y_i^2 + \sum u_iy_i}{a^2\sum y_i^2 + 2a\sum u_iy_i + \sum u_i^2}$$

This is certainly a biased estimator. Asymptotically, given the independence between $y_i$ and $u_i$ (orthogonality would suffice) we will get (multiplying up and down by $(1/n)$)

$$\hat b \xrightarrow{p} \hat b_p = \frac {aE(y^2)}{a^2E(y^2) + \sigma_u^2} = \frac 1a \cdot \left(\frac {E(y^2)}{E(y^2)+ (\sigma_u/a)^2}\right) \tag{4}$$

This shows that $\hat b$ is an inconsistent estimator for $1/a$. The term in the big parenthesis is always positive and smaller than unity, so we get the "attenuation bias" (bias towards zero) phenomenon, i.e. the plim of $\hat b$ will be closer to the zero value than $1/a$ irrespective of whether $a$ is positive or negative.

Can we do anything else? Well what if we attempt to estimate the variance using the residuals? We have

$$\hat \sigma^2_{\epsilon} = \frac 1n \sum \hat \epsilon_i^2 = \frac 1n \sum [y_i-\hat b(ay_i +u_i)]^2 = \frac 1n \sum [(1-\hat b a)y_i-\hat bu_i)]^2$$

$$= (1-\hat ba)^2\frac 1n \sum y_i^2 -2(1-\hat b a)\hat b\frac 1n\sum y_iu_i + \hat b^2\frac 1n\sum u_i^2$$

The probability limit of this is

$$\hat \sigma^2_p = (1-\hat b_pa)^2E(y^2) + \hat b_p^2 \sigma^2_u \tag{5}$$

Now note that
a) for the left-hand sides of $(4)$ and $(5)$ we have consistent estimates from the estimation procedure (since they are the actual probability limits of the estimators we used) b) We can estimate consistently $E(y^2)$

So if you rearrange $(4)$ to solve for $a$, rearrange $(5)$ to solve for $\sigma^2_u$, and use $\hat b$ instead of $\hat b_p$, $\hat \sigma^2_{\epsilon}$ instead of $\hat \sigma^2_p$, and $(1/n)\sum y_i^2$ instead of $E(y^2)$ you have a system of two equations in two unknowns ($a$ and $\sigma^2_u$).

Does it give a solution?

Related Question