Solved – In linear regression, are the noise terms independent of the coefficient estimators

estimatorsindependencelinear modelregression

In the Wikipedia article on the bias-variance tradeoff, the independence of the estimator $\hat f(x)$ and the noise term $\epsilon$ is used in a crucial way in the proof of the decomposition of the mean square error. No justification for this independence is given, and I can't seem to figure it out. For example, if $f(t)=\beta_0t + \beta_1$, $Y_i=f(x_i) + \epsilon_i$ ($i=1,\ldots,n$), and $\hat f(x)=\hat\beta_0 + \hat\beta_1 x$ as in simple linear regression, are the $\epsilon_i$ independent of $\hat\beta_0$ and $\hat\beta_1$?

Best Answer

No, they're not independent: In multiple linear regression the OLS coefficient estimator can be written as:

$$\begin{equation} \begin{aligned} \hat{\boldsymbol{\beta}} &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} (\mathbf{x}^\text{T} \mathbf{y}) \\[6pt] &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} (\mathbf{x} \boldsymbol{\beta} + \boldsymbol{\varepsilon}) \\[6pt] &= \boldsymbol{\beta} + (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \boldsymbol{\varepsilon}. \\[6pt] \end{aligned} \end{equation}$$

In regression problems we analyse the behaviour of the quantities conditional on the explanatory variables (i.e., conditional on the design matrix $\mathbf{x}$). The covariance between the coefficient estimators and errors is:

$$\begin{equation} \begin{aligned} \mathbb{Cov} ( \hat{\boldsymbol{\beta}}, \boldsymbol{\varepsilon} |\mathbf{x}) &= \mathbb{Cov} \Big( (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \boldsymbol{\varepsilon}, \boldsymbol{\varepsilon} \Big| \mathbf{x} \Big) \\[6pt] &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbb{Cov} ( \boldsymbol{\varepsilon}, \boldsymbol{\varepsilon} | \mathbf{x} ) \\[6pt] &= (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \mathbb{V} ( \boldsymbol{\varepsilon} | \mathbf{x} ) \\[6pt] &= \sigma^2 (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} \boldsymbol{I} \\[6pt] &= \sigma^2 (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T}. \\[6pt] \end{aligned} \end{equation}$$

In general, this covariance matrix is a non-zero matrix, and so the coefficient estimators are correlated with the error terms (conditional on the design matrix).


Special case (simple linear regression): In the special case where we have a simple linear regression with an intercept term and a single explanatory variable we have design matrix:

$$\mathbf{x} = \begin{bmatrix} 1 & x_1 \\[6pt] 1 & x_2 \\[6pt] \vdots & \vdots \\[6pt] 1 & x_n \\[6pt] \end{bmatrix},$$

which gives:

$$\begin{equation} \begin{aligned} (\mathbf{x}^\text{T} \mathbf{x})^{-1} \mathbf{x}^\text{T} &= \begin{bmatrix} n & & \sum x_i \\[6pt] \sum x_i & & \sum x_i^2 \\[6pt] \end{bmatrix}^{-1} \begin{bmatrix} 1 & 1 & \cdots & 1 \\[6pt] x_1 & x_2 & \cdots & x_n \\[6pt] \end{bmatrix} \\[6pt] &= \frac{1}{n \sum x_i^2 - (\sum x_i)^2} \begin{bmatrix} \sum x_i^2 & & -\sum x_i \\[6pt] -\sum x_i & & n \\[6pt] \end{bmatrix} \begin{bmatrix} 1 & 1 & \cdots & 1 \\[6pt] x_1 & x_2 & \cdots & x_n \\[6pt] \end{bmatrix} \\[6pt] &= \frac{1}{n \sum x_i^2 - (\sum x_i)^2} \begin{bmatrix} \sum x_i(x_i-x_1) & \cdots & \sum x_i(x_i-x_n) \\[6pt] -\sum (x_i-x_1) & \cdots & -\sum (x_i-x_n) \\[6pt] \end{bmatrix}. \\[6pt] \end{aligned} \end{equation}$$

Hence, we have:

$$\begin{equation} \begin{aligned} \mathbb{Cov}(\hat{\beta}_0, \varepsilon_k) &= \sigma^2 \cdot \frac{\sum x_i(x_i-x_k)}{n \sum x_i^2 - (\sum x_i)^2}, \\[10pt] \mathbb{Cov}(\hat{\beta}_1, \varepsilon_k) &= - \sigma^2 \cdot \frac{\sum (x_i-x_k)}{n \sum x_i^2 - (\sum x_i)^2}. \\[10pt] \end{aligned} \end{equation}$$

We can also obtain the correlation, which is perhaps a bit more useful. To do this we note that:

$$\mathbb{V}(\varepsilon_k) = \sigma^2 \quad \quad \quad \mathbb{V}(\hat{\beta}_0) = \frac{\sigma^2 \sum x_i^2}{n \sum x_i^2 - (\sum x_i)^2} \quad \quad \quad \mathbb{V}(\hat{\beta}_1) = \frac{\sigma^2 n}{n \sum x_i^2 - (\sum x_i)^2}.$$

Hence, we have correlation:

$$\begin{equation} \begin{aligned} \mathbb{Corr}(\hat{\beta}_0, \varepsilon_k) &= \frac{\sum x_i(x_i-x_k)}{\sqrt{(\sum x_i^2)(n \sum x_i^2 - (\sum x_i)^2)}}, \\[10pt] \mathbb{Corr}(\hat{\beta}_1, \varepsilon_k) &= - \frac{\sum (x_i-x_k)}{\sqrt{n(n \sum x_i^2 - (\sum x_i)^2)}}. \\[10pt] \end{aligned} \end{equation}$$

Related Question