The theoretical model is
$$E(Y\mid X)=\alpha +\beta X$$
Assuming that $X$ is a $0/1$ binary variable we notice that
$$E(Y\mid X=1) - E(Y\mid X=0)=\alpha +\beta -\alpha = \beta $$
I think that the OP asks "does the OLS estimator "mimics" this relationship, being perhaps its sample analogue?"
Let's see: we have that
$$\hat{\beta}=\frac{\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y)}{\frac{1}{n}\sum (x_i-\bar x)^2} = \frac {\operatorname{\hat Cov(Y,X)}}{\operatorname{\hat Var(X)}} $$
Now since $X$ is a binary variable, i.e. a Bernoulli random variable, we have that ${\operatorname{Var(X)} = p(1-p)}$ where $p\equiv P(X=1)$. Under a stationarity assumption, the sample estimate of this probability is simply the sample mean of $X$, denoted $\bar x$ and one can verify that indeed
$$\frac{1}{n}\sum (x_i-\bar x)^2 = {\operatorname{\hat Var(X)}}=\bar x (1-\bar x) =\hat p(1-\hat p)$$
Let's turn now to the covariance. We have
$$\operatorname{\hat Cov(Y,X)}=\frac{1}{n}\sum (x_i-\bar x)(y_i-\bar y) = \frac{1}{n}\sum x_iy_i -\bar x \bar y$$
Denote $n_1$ the number of those observations for which $x_i=1$. We can write
$$\frac{1}{n}\sum x_iy_i = \frac{1}{n}\sum_{x_i=1} y_i = \frac{n_1}{n}\cdot \frac{1}{n_1}\sum_{x_i=1} y_i = \hat p\cdot (\bar y \mid X=1) = \hat p \cdot \hat E(Y\mid X=1)$$
Also $\bar y = \hat E(Y)$ and using the law of total expectations we have
$$\hat E(Y) = \hat E(Y \mid X=1) \cdot \hat p + \hat E(Y \mid X=0)\cdot (1-\hat p)$$
Inserting all these results in the expression for the sample covariance we have
$$\operatorname{\hat Cov(Y,X)}= \hat p \cdot \hat E(Y\mid X=1) - \hat p\cdot \left[\hat E(Y \mid X=1) \cdot \hat p + \hat E(Y \mid X=0)\cdot (1-\hat p)\right]$$
$$= \hat p(1-\hat p)\cdot \left[\hat E(Y \mid X=1) - \hat E(Y \mid X=0)\right]$$
Inserting all in the expression for $\hat \beta$ we have
$$=\hat{\beta} = \frac {\operatorname{\hat Cov(Y,X)}}{\operatorname{\hat Var(X)}} = \frac {\hat p(1-\hat p)\cdot \left[\hat E(Y \mid X=1) - \hat E(Y \mid X=0)\right]}{\hat p(1-\hat p)} $$
$$\Rightarrow \hat{\beta} = \hat E(Y \mid X=1) - \hat E(Y \mid X=0)$$
which is the sample analogue/feasible implementation of the theoretical relationship. I leave the demonstration related to the $\hat \alpha$ for the OP to work out.
We want to show that $\sum_{i=1}^{n} (y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) (x_i - \bar{x}) = 0$, which is the same as $\sum_{i=1}^{n} (\hat{\beta}_0 + \hat{\beta}_1 x_i ) (x_i - \bar{x}) = \sum_{i=1}^{n} y_i (x_i - \bar{x})$. This roughly says that the weighted average of fitted $y$ values equals the weighted average of actual $y$ values, using weights $x_i - \bar{x}$. For this we just need to do some algebra and remember the definitions $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$ and $\hat{\beta}_1 = \sum_{i=1}^{n} (y_i - \bar{y}) (x_i - \bar{x}) / \sum_{i=1}^{n} (x_i - \bar{x})^2$. Let's start with the left hand side
\begin{align}
\sum_{i=1}^{n} (\hat{\beta}_0 + \hat{\beta}_1 x_i ) (x_i - \bar{x}) &= \sum_{i=1}^{n} (\bar{y} - \hat{\beta}_1 \bar{x} + \hat{\beta}_1 x_i ) (x_i - \bar{x}) \\
&= \bar{y} \sum_{i=1}^{n} (x_i - \bar{x}) + \hat{\beta}_1 \sum_{i=1}^{n} (x_i - \bar{x})^2 .
\end{align}
We know the first term is zero, and the sum of squares $\sum_{i=1}^{n} (x_i - \bar{x})^2$ cancels with the denominator of $\hat{\beta}_1$ leaving us with just $\sum_{i=1}^{n} (y_i - \bar{y}) (x_i - \bar{x}) = \sum_{i=1}^{n} y_i (x_i - \bar{x})$, which is what we wanted to show.
Best Answer
Let the model be $y=X\beta+e$ where X contains $r$ (or r+1) covariates and the coefficient $\beta$ probably has $r+1$ elements, one for the intercept. We assume other standards assumptions hold unchanged. Then the OLS estimation is
$$ \hat\beta=(X'X)^{-1}X'y $$ and
\begin{align} & E(\hat\beta)=(X'X)^{-1}X' E(X\beta +e)=\beta\\ & Var(\hat\beta)=[(X'X)^{-1}X']^2 var(e)=(X'X)^{-1}var(e) \end{align} Now everything depend on the distribution of $e$ which in ordinary regression it is Gaussian. As a result $\hat\beta\sim N(\beta,var(\hat\beta))$