Obtain expressions for coefficients from OLS formula

least squareslinear regressionregressionstatistics

Consider the standard linear regression model: $y_i = \alpha + \beta D_i + e_i$ where the coefficients are defined by linear projections and $D_i$ is a dummy variable. In the population, the coefficients are given by:

$$\alpha = E[y_i \mid D_i =0] \ \text{and} \ \beta = E[y_i \mid D_i = 1] – E[y_i \mid D_i =0]$$

Using OLS to estimate the coefficients, we get:

$$\widehat{\alpha} = \frac{1}{\sum_{i=1}^{N}1(D_i=0)}\sum_{i=1}^{N}1(D_i=0)y_i $$

$$\widehat{\beta} = \frac{1}{\sum_{i=1}^{N}1(D_i=1)}\sum_{i=1}^{N}1(D_i=1)y_i-\frac{1}{\sum_{i=1}^{N}1(D_i=0)}\sum_{i=1}^{N}1(D_i=0)y_i $$

In other words, $\widehat{\alpha}$ is just the sample mean of $y_i$ in the subsample with $D_i=0$.

My question is, how can we arrive at the above coefficient estimates by using the standard OLS formulas? That is:

$$\widehat{\alpha} = \overline{y} – \overline{D}\widehat{\beta} \ \ \text{and} \ \ \widehat{\beta} = \frac{\sum_{i=1}^{N}(D_i – \overline{D})(y_i – \overline{y})}{\sum_{i=1}^{N}(D_i – \overline{D})^2}$$ where the bars represent sample means.

Best Answer

Denote by $n_0$ the number of zeroes (of $D$) and by $n_1$ the number of ones, such that the total number of observation $n$ is $n_0 + n_1$. For $\beta$ you can see here full derivation, that is compactly can be written as $$ \hat{\beta} = \bar{y}_1 - \bar{y}_0. $$ For $\alpha$ you can just plug in the result, namely, \begin{align} \hat{\alpha} &= \bar{y}_n - \bar{D}\hat{\beta}\\ &=\frac{n_!\bar{y}_1 + n_0\bar{y}_0}{ n_0 + n_1} - \frac{ n_1 }{ n_0 + n_1 }( \bar{y}_1 - \bar{y}_0) \\ & = \frac{ n_0 + n_1}{n_0 + n_1} \bar{y}_0 \\ &=\bar{y}_0. \end{align}

Related Question