Solved – How to calculate the variance of the OLS estimator $\beta_0$, conditional on $x_1, \ldots , x_n$

regressionself-study

I know that
$$\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}$$
and this is how far I got when I calculated the variance:

\begin{align*}
Var(\hat{\beta_0}) &= Var(\bar{y} – \hat{\beta_1}\bar{x}) \\
&= Var((-\bar{x})\hat{\beta_1}+\bar{y}) \\
&= Var((-\bar{x})\hat{\beta_1})+Var(\bar{y}) \\
&= (-\bar{x})^2 Var(\hat{\beta_1}) + 0 \\
&= (\bar{x})^2 Var(\hat{\beta_1}) + 0 \\
&= \frac{\sigma^2 (\bar{x})^2}{\displaystyle\sum\limits_{i=1}^n (x_i – \bar{x})^2}
\end{align*}

but that's far as I got. The final formula I'm trying to calculate is

\begin{align*}
Var(\hat{\beta_0}) &= \frac{\sigma^2 n^{-1}\displaystyle\sum\limits_{i=1}^n x_i^2}{\displaystyle\sum\limits_{i=1}^n (x_i – \bar{x})^2}
\end{align*}

I'm not sure how to get $$(\bar{x})^2 = \frac{1}{n}\displaystyle\sum\limits_{i=1}^n x_i^2$$ assuming my math is correct up to there.

Is this the right path?

\begin{align}
(\bar{x})^2 &= \left(\frac{1}{n}\displaystyle\sum\limits_{i=1}^n x_i\right)^2 \\
&= \frac{1}{n^2} \left(\displaystyle\sum\limits_{i=1}^n x_i\right)^2
\end{align}

I'm sure it's simple, so the answer can wait for a bit if someone has a hint to push me in the right direction.

Best Answer

This is a self-study question, so I provide hints that will hopefully help to find the solution, and I'll edit the answer based on your feedbacks/progress.

The parameter estimates that minimize the sum of squares are \begin{align} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} , \\ \hat{\beta}_1 &= \frac{ \sum_{i = 1}^n(x_i - \bar{x})y_i }{ \sum_{i = 1}^n(x_i - \bar{x})^2 } . \end{align} To get the variance of $\hat{\beta}_0$, start from its expression and substitute the expression of $\hat{\beta}_1$, and do the algebra $$ {\rm Var}(\hat{\beta}_0) = {\rm Var} (\bar{Y} - \hat{\beta}_1 \bar{x}) = \ldots $$

Edit:
We have \begin{align} {\rm Var}(\hat{\beta}_0) &= {\rm Var} (\bar{Y} - \hat{\beta}_1 \bar{x}) \\ &= {\rm Var} (\bar{Y}) + (\bar{x})^2 {\rm Var} (\hat{\beta}_1) - 2 \bar{x} {\rm Cov} (\bar{Y}, \hat{\beta}_1). \end{align} The two variance terms are $$ {\rm Var} (\bar{Y}) = {\rm Var} \left(\frac{1}{n} \sum_{i = 1}^n Y_i \right) = \frac{1}{n^2} \sum_{i = 1}^n {\rm Var} (Y_i) = \frac{\sigma^2}{n}, $$ and \begin{align} {\rm Var} (\hat{\beta}_1) &= \frac{ 1 }{ \left[\sum_{i = 1}^n(x_i - \bar{x})^2 \right]^2 } \sum_{i = 1}^n(x_i - \bar{x})^2 {\rm Var} (Y_i) \\ &= \frac{ \sigma^2 }{ \sum_{i = 1}^n(x_i - \bar{x})^2 } , \end{align} and the covariance term is \begin{align} {\rm Cov} (\bar{Y}, \hat{\beta}_1) &= {\rm Cov} \left\{ \frac{1}{n} \sum_{i = 1}^n Y_i, \frac{ \sum_{j = 1}^n(x_j - \bar{x})Y_j }{ \sum_{i = 1}^n(x_i - \bar{x})^2 } \right \} \\ &= \frac{1}{n} \frac{ 1 }{ \sum_{i = 1}^n(x_i - \bar{x})^2 } {\rm Cov} \left\{ \sum_{i = 1}^n Y_i, \sum_{j = 1}^n(x_j - \bar{x})Y_j \right\} \\ &= \frac{ 1 }{ n \sum_{i = 1}^n(x_i - \bar{x})^2 } \sum_{i = 1}^n (x_j - \bar{x}) \sum_{j = 1}^n {\rm Cov}(Y_i, Y_j) \\ &= \frac{ 1 }{ n \sum_{i = 1}^n(x_i - \bar{x})^2 } \sum_{i = 1}^n (x_j - \bar{x}) \sigma^2 \\ &= 0 \end{align} since $\sum_{i = 1}^n (x_j - \bar{x})=0$.
And since $$\sum_{i = 1}^n(x_i - \bar{x})^2 = \sum_{i = 1}^n x_i^2 - 2 \bar{x} \sum_{i = 1}^n x_i + \sum_{i = 1}^n \bar{x}^2 = \sum_{i = 1}^n x_i^2 - n \bar{x}^2, $$ we have \begin{align} {\rm Var}(\hat{\beta}_0) &= \frac{\sigma^2}{n} + \frac{ \sigma^2 \bar{x}^2}{ \sum_{i = 1}^n(x_i - \bar{x})^2 } \\ &= \frac{\sigma^2 }{ n \sum_{i = 1}^n(x_i - \bar{x})^2 } \left\{ \sum_{i = 1}^n(x_i - \bar{x})^2 + n \bar{x}^2 \right\} \\ &= \frac{\sigma^2 \sum_{i = 1}^n x_i^2}{ n \sum_{i = 1}^n(x_i - \bar{x})^2 }. \end{align}

Edit 2

Why do we have ${\rm var} ( \sum_{i = 1}^n Y_i) = \sum_{i = 1}^n {\rm Var} (Y_i) $?

The assumed model is $ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$, where the $\epsilon_i$ are independant and identically distributed random variables with ${\rm E}(\epsilon_i) = 0$ and ${\rm var}(\epsilon_i) = \sigma^2$.

Once we have a sample, the $X_i$ are known, the only random terms are the $\epsilon_i$. Recalling that for a random variable $Z$ and a constant $a$, we have ${\rm var}(a+Z) = {\rm var}(Z)$. Thus, \begin{align} {\rm var} \left( \sum_{i = 1}^n Y_i \right) &= {\rm var} \left( \sum_{i = 1}^n \beta_0 + \beta_1 X_i + \epsilon_i \right)\\ &= {\rm var} \left( \sum_{i = 1}^n \epsilon_i \right) = \sum_{i = 1}^n \sum_{j = 1}^n {\rm cov} (\epsilon_i, \epsilon_j)\\ &= \sum_{i = 1}^n {\rm cov} (\epsilon_i, \epsilon_i) = \sum_{i = 1}^n {\rm var} (\epsilon_i)\\ &= \sum_{i = 1}^n {\rm var} (\beta_0 + \beta_1 X_i + \epsilon_i) = \sum_{i = 1}^n {\rm var} (Y_i).\\ \end{align} The 4th equality holds as ${\rm cov} (\epsilon_i, \epsilon_j) = 0$ for $i \neq j$ by the independence of the $\epsilon_i$.