OLS – Reformulation of OLS Estimators in Simple Regression with a Dummy Variable

least squaresself-study

In the Classical Regression Model i.e. $\big(E(y|x)=\alpha +\beta x$ and $Var(y|x)=\sigma^2\big)$ with only two coefficients for intercept $\alpha$ and slope $\beta$ of a dummy variable $x$, we can interpret $\alpha$ as the the mean of values for which $x=0$ and $\beta$ as the difference of the means of the data where $x=1$ and $x=0$ respectively. This makes intutitive sense, but how can I formally show that I can deduce those special expression from the standard definition:
$$\hat{\alpha}=\bar y -\hat{\beta} \bar x$$ and
$$\hat{\beta}=\frac{\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y)}{\frac{1}{n}\sum (x_i-\bar x)^2}.$$

I cannot reach the formulation in terms of group means.

Best Answer

The theoretical model is

$$E(Y\mid X)=\alpha +\beta X$$

Assuming that $X$ is a $0/1$ binary variable we notice that

$$E(Y\mid X=1) - E(Y\mid X=0)=\alpha +\beta -\alpha = \beta $$

I think that the OP asks "does the OLS estimator "mimics" this relationship, being perhaps its sample analogue?"

Let's see: we have that

$$\hat{\beta}=\frac{\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y)}{\frac{1}{n}\sum (x_i-\bar x)^2} = \frac {\operatorname{\hat Cov(Y,X)}}{\operatorname{\hat Var(X)}} $$

Now since $X$ is a binary variable, i.e. a Bernoulli random variable, we have that ${\operatorname{Var(X)} = p(1-p)}$ where $p\equiv P(X=1)$. Under a stationarity assumption, the sample estimate of this probability is simply the sample mean of $X$, denoted $\bar x$ and one can verify that indeed $$\frac{1}{n}\sum (x_i-\bar x)^2 = {\operatorname{\hat Var(X)}}=\bar x (1-\bar x) =\hat p(1-\hat p)$$

Let's turn now to the covariance. We have

$$\operatorname{\hat Cov(Y,X)}=\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y) = \frac{1}{n}\sum x_iy_i -\bar x \bar y$$

Denote $n_1$ the number of those observations for which $x_i=1$. We can write

$$\frac{1}{n}\sum x_iy_i = \frac{1}{n}\sum_{x_i=1} y_i = \frac{n_1}{n}\cdot \frac{1}{n_1}\sum_{x_i=1} y_i = \hat p\cdot (\bar y \mid X=1) = \hat p \cdot \hat E(Y\mid X=1)$$

Also $\bar y = \hat E(Y)$ and using the law of total expectations we have

$$\hat E(Y) = \hat E(Y \mid X=1) \cdot \hat p + \hat E(Y \mid X=0)\cdot (1-\hat p)$$

Inserting all these results in the expression for the sample covariance we have

$$\operatorname{\hat Cov(Y,X)}= \hat p \cdot \hat E(Y\mid X=1) - \hat p\cdot \left[\hat E(Y \mid X=1) \cdot \hat p + \hat E(Y \mid X=0)\cdot (1-\hat p)\right]$$

$$= \hat p(1-\hat p)\cdot \left[\hat E(Y \mid X=1) - \hat E(Y \mid X=0)\right]$$

Inserting all in the expression for $\hat \beta$ we have

$$=\hat{\beta} = \frac {\operatorname{\hat Cov(Y,X)}}{\operatorname{\hat Var(X)}} = \frac {\hat p(1-\hat p)\cdot \left[\hat E(Y \mid X=1) - \hat E(Y \mid X=0)\right]}{\hat p(1-\hat p)} $$

$$\Rightarrow \hat{\beta} = \hat E(Y \mid X=1) - \hat E(Y \mid X=0)$$

which is the sample analogue/feasible implementation of the theoretical relationship. I leave the demonstration related to the $\hat \alpha$ for the OP to work out.