Solved – Prove that the OLS estimator of the intercept is BLUE

blueregression

Consider the simple linear regression model
$$y_i = \alpha + \beta x_i + u_i$$
with classic Gauss-Markov assumptions. In proving that $\hat{\beta}$, the OLS estimator for $\beta$, is the best linear unbiased estimator, one approach is to define an alternative estimator as a weighted sum of $y_i$:
$$\tilde{\beta} = \sum_{i=1}^n c_i y_i$$
Then, we define $c_i = k_i + d_i$, where $k_i = \frac{x_i – \bar{x}}{\sum_{i=1}^n (x_i – \bar{x})^2}$ and so the OLS estimator for $\beta$ can be written in the form $\hat{\beta} = \sum_{i=1}^n k_i y_i$. To show that $\hat{\beta}$ is BLUE, the alternative estimator can be written as:
$$\tilde{\beta} = \hat{\beta} + \sum_{i=1}^n d_i y_i$$
Hence, its variance can be written:
$$Var(\tilde{\beta}) = Var(\hat{\beta}) + \sum_{i=1}^n d_i^2 Var(y_i) + 2\sum_{i=1}^n k_i d_i Var(y_i)$$
Then:
\begin{align*}
\sum_{i=1}^n k_i d_i &= \sum_{i=1}^n k_i(c_i – k_i) \\
&= \sum_{i=1}^n k_i c_i – \sum_{i=1}^n k_i^2 \\
&= \sum_{i=1}^n c_i \bigg(\frac{x_i – \bar{x}}{\sum_{i=1}^n (x_i – \bar{x})^2} \bigg) – \frac{1}{\sum_{i=1}^n (x_i – \bar{x})^2} \\
&= \bigg(\frac{\sum_{i=1}^n c_i x_i – \bar{x} \sum_{i=1}^n c_i}{\sum_{i=1}^n (x_i – \bar{x})^2} \bigg) – \frac{1}{\sum_{i=1}^n (x_i – \bar{x})^2}
\end{align*}

By the conditions of linearity and unbiasedness, it can be shown that $\sum_{i=1}^n c_i x_i = 1$, and $\sum_{i=1}^n c_i = 0$ – so:
\begin{align*}
\sum_{i=1}^n k_i d_i &= \frac{1}{\sum_{i=1}^n (x_i – \bar{x})^2} – \frac{1}{\sum_{i=1}^n (x_i – \bar{x})^2} = 0
\end{align*}

The third term in the expression for $Var(\tilde{\beta})$ drops out. Then it is plain that the variance of any alternative unbiased estimator, $\tilde{\beta}$, for $\beta$ has a variance at least as large as $\hat{\beta}$: so the OLS estimator is BLUE. I want to prove that $\hat{\alpha}$, the OLS estimator for the intercept $\alpha$, is BLUE in the same way, but I'm having difficulty determining what value to now assign to $k_i$ such that $\hat{\alpha} = \sum_{i=1}^n k_i y_i$. So far, what I have is that:
\begin{align*}
\hat{\alpha} &= \bar{y} – \hat{\beta} \bar{x} \\
&= \frac{1}{n} \sum_{i=1}^n y_i – \frac{\sum_{i=1}^n (x_i – \bar{x})y_i}{\sum{i=1}^n (x_i – \bar{x})^2} \bar{x} \\
&= \sum_{i=1}^n y_i \bigg[\frac{1}{n} – \frac{(x_i – \bar{x})\bar{x}}{\sum_{i=1}^n (x_i – \bar{x})^2} \bigg]\\
&= \sum_{i=1}^n k_i y_i
\end{align*}

where $k_i = \frac{1}{n} – \frac{(x_i – \bar{x})\bar{x}}{\sum_{i=1}^n (x_i – \bar{x})^2}$, but things seem to go awry when I work through the rest of the proof.

Best Answer

This is one of those theorems that is easier to prove in greater generality using vector algebra than it is to prove with scalar algebra. To do this, consider the multiple linear regression model $\mathbf{Y} = \mathbf{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}$ and consider the general linear estimator:

$$\hat{\boldsymbol{\beta}}_\mathbf{A} = \hat{\boldsymbol{\beta}}_\text{OLS} + \mathbf{A} \mathbf{Y} = [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] \mathbf{Y}.$$

Since the OLS estimator is unbiased and $\mathbb{E}(\mathbf{Y}) = \mathbf{x} \boldsymbol{\beta}$ this general linear estimator has bias:

$$\begin{align} \text{Bias}(\hat{\boldsymbol{\beta}}_\mathbf{A}, \boldsymbol{\beta}) &\equiv \mathbb{E}(\hat{\boldsymbol{\beta}}_\mathbf{A}) - \boldsymbol{\beta} \\[6pt] &= \mathbb{E}(\hat{\boldsymbol{\beta}}_\text{OLS} + \mathbf{A} \mathbf{Y}) - \boldsymbol{\beta} \\[6pt] &= \boldsymbol{\beta} + \mathbf{A} \mathbf{x} \boldsymbol{\beta} - \boldsymbol{\beta} \\[6pt] &= \mathbf{A} \mathbf{x} \boldsymbol{\beta}, \\[6pt] \end{align}$$

and so the requirement of unbiasedness imposes the restriction that $\mathbf{A} \mathbf{x} = \mathbf{0}$. The variance of the general linear estimator is:

$$\begin{align} \mathbb{V}(\hat{\boldsymbol{\beta}}_\mathbf{A}) &= \mathbb{V}(\mathbf{A} \mathbf{Y}) \\[6pt] &= [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] \mathbb{V}(\mathbf{Y}) [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}]^\text{T} \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}]^\text{T} \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} + \mathbf{A}] [\mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbf{A}^\text{T} + \mathbf{A} \mathbf{x} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} (\mathbf{A} \mathbf{x})^\text{T} + (\mathbf{A} \mathbf{x}) (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{0}^\text{T} + \mathbf{0} (\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}] \\[6pt] &= \sigma^2 [(\mathbf{x}^\text{T} \mathbf{x})^{-1} + \mathbf{A} \mathbf{A}^\text{T}]. \\[6pt] \end{align}$$

Hence, we have:

$$\mathbb{V}(\hat{\boldsymbol{\beta}}_\mathbf{A}) - \mathbb{V}(\hat{\boldsymbol{\beta}}_\text{OLS}) = \sigma^2 \mathbf{A} \mathbf{A}^\text{T}.$$

Now, since $\mathbf{A} \mathbf{A}^\text{T}$ is a positive definite matrix, we can see that the variance of the general linear estimator is minimised when $\mathbf{A} = \mathbf{0}$, which yields the OLS estimator.