The principle underlying least squares regression is that the sum of the squares of the errors is minimized. We can use calculus to find equations for the parameters $\beta_0$ and $\beta_1$ that minimize the sum of the squared errors, $S$.
$$S = \displaystyle\sum\limits_{i=1}^n \left(e_i \right)^2= \sum \left(y_i - \hat{y_i} \right)^2= \sum \left(y_i - \beta_0 - \beta_1x_i\right)^2$$
We want to find $\beta_0$ and $\beta_1$ that minimize the sum, $S$. We start by taking the partial derivative of $S$ with respect to $\beta_0$ and setting it to zero.
$$\frac{\partial{S}}{\partial{\beta_0}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-1) = 0$$
$$\sum \left(y_i - \beta_0 - \beta_1x_i\right) = 0 $$
$$\sum \beta_0 = \sum y_i -\beta_1 \sum x_i $$
$$n\beta_0 = \sum y_i -\beta_1 \sum x_i $$
$$\beta_0 = \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i \tag{1}$$
$$\beta_0 = \bar y - \beta_1 \bar x \tag{*} $$
now take the partial of $S$ with respect to $\beta_1$ and set it to zero.
$$\frac{\partial{S}}{\partial{\beta_1}} = \sum 2\left(y_i - \beta_0 - \beta_1x_i\right)^1 (-x_i) = 0$$
$$\sum x_i \left(y_i - \beta_0 - \beta_1x_i\right) = 0$$
$$\sum x_iy_i - \beta_0 \sum x_i - \beta_1 \sum x_i^2 = 0 \tag{2}$$
substitute $(1)$ into $(2)$
$$\sum x_iy_i - \left( \frac{1}{n}\sum y_i -\beta_1 \frac{1}{n}\sum x_i\right) \sum x_i - \beta_1 \sum x_i^2 = 0 $$
$$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i + \beta_1 \frac{1}{n} \left( \sum x_i \right) ^2 - \beta_1 \sum x_i^2 = 0 $$
$$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i = - \beta_1 \frac{1}{n} \left( \sum x_i \right) ^2 + \beta_1 \sum x_i^2 $$
$$\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i = \beta_1 \left(\sum x_i^2 - \frac{1}{n} \left( \sum x_i \right) ^2 \right) $$
$$\beta_1 = \frac{\sum x_iy_i - \frac{1}{n} \sum x_i \sum y_i}{\sum x_i^2 - \frac{1}{n} \left( \sum x_i \right) ^2 } = \frac{cov(x,y)}{var(x)}\tag{*}$$
For establishing a more general result, I am referring to the lecture notes here.
Suppose we have the multiple linear regression model
$$y=X\beta + \varepsilon$$
, where the design matrix $X$ (with non-random entries) of order $n\times k$ has full column rank and $\beta=(\beta_1,\ldots,\beta_k)^T$ is the vector of regression coefficients.
Further assume that $\varepsilon \sim N_n(0,\sigma^2 I_n)$ with $\sigma^2$ unknown, so that $y\sim N_n(X\beta,\sigma^2 I_n)$.
Under this setting, we know that the OLS estimator of $\beta$ is $$\hat\beta=(X^T X)^{-1}X^T y\sim N_k\left(\beta,\sigma^2(X^T X)^{-1}\right)$$
So, $$(y-X\hat\beta)^T X(\hat\beta-\beta)=(y^TX-y^T X)(\hat\beta-\beta)=0$$
Hence,
\begin{align}
\left\| y-X\beta \right\|^2 &=\left\|y-X\hat\beta+X\hat\beta-X\beta\right\|^2
\\&=\left\|y-X\hat\beta\right\|^2+\left\|X\hat\beta-X\beta\right\|^2
\\&=\left\|y-X\hat\beta\right\|^2+\left\|X\hat\beta\right\|^2+\left\|X\beta\right\|^2-2\beta^T X^T y
\end{align}
The pdf of $y$ now looks like
\begin{align}
f(y;\beta,\sigma^2)&=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left[-\frac{1}{2\sigma^2}\left\| y-X\beta \right\|^2\right]
\\\\&=\frac{1}{(2\pi\sigma^2)^{n/2}}\exp\left[\frac{1}{\sigma^2}\beta^T X^T y-\frac{1}{2\sigma^2}\left(\left\|y-X\hat\beta\right\|^2+\left\|X\hat\beta\right\|^2\right)-\frac{1}{2\sigma^2}\left\|X\beta\right\|^2\right]
\end{align}
Noting that $X\hat\beta$ is a function of $X^T y$ and that the above density is a member of the exponential family, a complete sufficient statistic for $(\beta,\sigma^2)$ is given by $$T=\left(X^Ty,\left\|y-X\hat\beta\right\|^2\right)$$
Again $\hat\beta$, a function of $X^T y$, is also a function of $T$ and it is unbiased for $\beta$.
So by Lehmann-Scheffe theorem $\hat\beta$ is the UMVUE of $\beta$.
Best Answer
$(1)\ E(\hat{Y_h}) = E(b_0 + b_1X_h) = \beta_0 +\beta_1X_h$
$(2)\ var(\hat{Y_h}) = var(b_0 + b_1X_h)$
An alternate (but equivalent) version of the regression model can be written as:
$Y_i = \beta_0X_0 + \beta_1X_1 + \epsilon_i$
This model associates an X variable with each coefficient $(where X_0 = 1)$
Al alternate modification is to use the deviation $X_i -\bar{X}$ rather than $X_i$
So the model can be written as:
$Y_i = \beta_0^* + \beta_1(X_i - \bar{X}) + \epsilon_i$
where $(3)\ \beta_0^* = \beta_0 + \beta_1\bar{X}$
These models can be used interchangably.
We know from the normal equations:
$\Sigma Y_i = nb_0 + b_1\Sigma X_i$
solving for $b_0$
$(4)\ b_0 = \bar{Y} - b_1\bar{X}$
So substituting from (3) and (4):
$b_0^* = b_0 + b_1\bar{X} = (\bar{Y} - b_1\bar{X}) + b_1\bar{X} = \bar{Y}$
$(5)\ var(\hat{Y_h}) = var(b_0 + b_1X_h) = var(\bar{Y} + b_1(X_h - \bar{X}))$
using:
$var(\bar{Y}) = \frac{\sigma^2}{n}$
$var(aX) = a^2var(X)$
and
$var(X + Y) = Var(X) + Var(Y) + 2Cov(X,Y)$
So:
= $var(\bar{Y}) +(X_h - \bar{X})^2var(b_1) + 2(X_h-\bar{X})cov(\bar{Y},b_1)$
we use the fact that $Cov(\bar{Y},b_1) = 0$ due to the i.i.d assumption on $\epsilon_i$
$= \frac{\sigma^2}{n} + (X_h-\bar{X})^2\frac{\sigma^2}{\Sigma(X_i-\bar{X})^2}$
$= \sigma^2[\frac{1}{n} + \frac{(X_h - \bar{X})^2}{\Sigma(X_i - \bar{X})^2}]$