Estimate of a slope in the simple linear regression model $y=\beta_0+\beta_1 x+\epsilon$

linear regressionregressionregression analysis

I have two formulas for estimate of a slope in the simple linear regression model $y=\beta_0+\beta_1 x+\epsilon$:

  • $\hat{\beta_1}=\frac{\sum_{i=1}^N(x_i-\bar{x})(y_i-\bar{y})}{\sum^N_{i=1}(x_i-\bar{x})^2}$

  • $\hat{\beta_1}=(X_{1\times N}'X_{N\times1})^{-1}X'_{1\times N}Y_{N\times 1}=\frac{\sum_{i=1}^Nx_iy_i}{\sum^N_{i=1}x_i^2}$

These formulas are not equivalent, but are used in different literature as a main formulas to find the estimate of $\beta_1$. When we use each formula? what is the motivation to use averages of x's and y's in the first formula?

Second part of my question.

If I want to find a variance of $\hat{\beta_1}$ from the first formula
$$Var(\hat{\beta_1})=\frac{\sum_{i=1}^N(x_i-\bar{x})^2Var(y_i-\bar{y})}{\big[\sum^N_{i=1}(x_i-\bar{x})^2\big]^2}=\frac{(\sigma^2-\frac{\sigma^2}{N})}{\sum^N_{i=1}(x_i-\bar{x})^2}$$
but the answer should be
$$\frac{\sigma^2}{\sum^N_{i=1}(x_i-\bar{x})^2}$$
Where is my mistake?

Best Answer

First question

The first $\hat{\beta}_1$ is the OLS estimator of the model \begin{equation} y_i = \beta_0 + \beta_1 x_i + \epsilon_i \end{equation} The second OLS estimator corresponds to the model \begin{equation} y_i = \beta_1 x_i + \epsilon \end{equation} i.e. one with no intercept. Redo the math by assuming the first model, i.e. in matrix form you have \begin{equation} X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} \end{equation} and you will see that both estimators are the same.


Second question

Let's rewrite the numerator of $\hat{\beta_1}$

\begin{align} \sum_i (x_i - \bar{x})(y_i - \bar{y}) = \sum_i (x_i - \bar{x})y_i - \sum_i (x_i - \bar{x})\bar{y} \tag{1} \end{align} Let's work with the second term a bit \begin{align} \sum_i (x_i - \bar{x})\bar{y} &= \bar{y}\sum_i (x_i - \bar{x})\\ &= \bar{y}\left(\left(\sum_i x_i\right) - n\bar{x}\right)\\ &= \bar{y}\left(n\bar{x} - n\bar{x}\right)\\ &= 0 \end{align} So equation $(1)$ becomes \begin{equation} \sum_i (x_i - \bar{x})(y_i - \bar{y}) = \sum_i (x_i - \bar{x})y_i = \sum_i (x_i - \bar{x})(\beta_0 + \beta_1x_i + \epsilon_i ) \end{equation} You now get \begin{align} \text{Var}(\hat{\beta_1}) & = \text{Var} \left(\frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2} \right) \\ &= \text{Var} \left(\frac{\sum_i (x_i - \bar{x})(\beta_0 + \beta_1x_i + \epsilon_i )}{\sum_i (x_i - \bar{x})^2} \right)\\ &= \text{Var} \left(\frac{\sum_i (x_i - \bar{x})\epsilon_i}{\sum_i (x_i - \bar{x})^2} \right)\\ &= \frac{\sum_i (x_i - \bar{x})^2\text{Var}(\epsilon_i)}{\left(\sum_i (x_i - \bar{x})^2\right)^2}\\ &= \frac{\sigma^2}{\sum_i (x_i - \bar{x})^2} \\ \end{align}