Solved – Estimated bias due to endogeneity, formula in Adda et al (2011)

biasleast squares

For a general model
$$y_{i} = \alpha + \beta_{1}X_{1} + \beta_{2}X_{2} + \epsilon_{i}$$
regressing $y_{i}$ on $X_{1}$ alone will result in $\beta_{1}$ being biased given by:
$$plim \: \widehat{\beta}_{1} = \beta_{1} + \beta_{2}\frac{Cov(X_{1},X_{2})}{Var(X_{1})}$$

Now I stumbled across a paper which has a model of the form
$$S_{i} = \alpha D_{i} + \beta P_{i} + u_{i}$$
without any other variables and no intercept. The true value of $\beta$ is supposed to be zero because the event $S$ precedes event $P$. The authors then state the omitted variable bias of the two coefficients as:
$$\widehat{\alpha}_{OLS} = \alpha + \frac{Cov(D,u)Var(P) – Cov(D,P)Cov(P,u)}{Var(D)Var(P) – Cov(D,P)^{2}}$$
and
$$\widehat{\beta}_{OLS} = \frac{Cov(P,u)Var(D) – Cov(D,P)Cov(D,u)}{Var(D)Var(P) – Cov(D,P)^{2}}$$

I have tried for a long time now to reproduce this result but I cannot figure out how they arrived at these two bias expressions. The problematic point is that I apparently cannot understand from which OVB formula they start because in their setting it cannot be the usual one for a single variable written above. The paper is by Adda et al. (2011) for which the posed problem can be found on PDF page 52.

If someone could point towards the right direction I would be most grateful.

Best Answer

The "bias" they are mentioning in appendix A of the article is not the omitted variables bias, but a bias due to possible endogeneity. (They mention it several times in the article, and specifically in page 11 of the PDF, in the bottom part).
The estimator formulas that are mentioned in the Appendix are the regular OLS estimator formulas for 2 variables, as can be found in page 8 here:

For $S_{i} = \alpha D_{i} + \beta P_{i} + u_{i}$:

$\widehat{\alpha}_{OLS} = \frac{Cov(D,S)Var(P) - Cov(D,P)Cov(P,S)}{Var(D)Var(P) - Cov(D,P)^{2}} = \alpha + \frac{Cov(D,u)Var(P) - Cov(D,P)Cov(P,u)}{Var(D)Var(P) - Cov(D,P)^{2}}$

$\widehat{\beta}_{OLS} = \frac{Cov(P,S)Var(D) - Cov(D,P)Cov(D,S)}{Var(D)Var(P) - Cov(D,P)^{2}} = \beta+ \frac{Cov(P,u)Var(D) - Cov(D,P)Cov(D,u)}{Var(D)Var(P) - Cov(D,P)^{2}}$

They stated the $\beta$ equals 0, and that's how they got the formulas in the article. Again - the possible bias that they are mentioning is because of a possible endogeneity, not because any ommision.