Zero Covariance vs Independence of Slope and Intercept Estimators in Linear Models with Least Squares

covarianceindependenceleast squaresmathematical-statisticsself-study

$\newcommand{\Cov}{\operatorname{Cov}}$Problem Statement: Under the assumptions of Exercise 11.16, find
$\Cov\big(\hat\beta_0,\hat\beta_1\big).$ Use this answer to show that
$\hat\beta_0$ and $\hat\beta_1$ are independent if $\displaystyle\sum_{i=1}^n
x_i=0.$
[Hint: $\Cov\big(\hat\beta_0,\hat\beta_1\big)=
\Cov\big(\overline{Y}-\hat\beta_1\overline{x},\hat\beta_1\big).$
Use Theorem
5.12 and the results of this section.]

Note: This is Problem 11.17 in Mathematical Statistics with Applications, 5th Ed., by Wackerly, Mendenhall, and Scheaffer.

My Work So Far: The assumptions of Exercise 11.16 are that $Y_1, Y_2,\dots,Y_n$ are independent normal random variables with $E(Y_i)=\beta_0+\beta_1 x_i$ and $V(Y_i)=\sigma^2.$ The first part of this question is largely done for us in the book.
That is, it is derived that
$$\Cov\big(\hat\beta_0,\hat\beta_1\big)
=-\frac{\overline{x}\,\sigma^2}{\displaystyle\sum_{i=1}^n(x_i-\overline{x})^2},$$

where $\operatorname{Var}(Y_i)=\sigma^2.$
Now $\overline{x}=0$ if and only if $\sum_{i=1}^n x_i=0.$ So if the sum is
zero, the covariance is zero. However, just because $\hat\beta_0$ and
$\hat\beta_1$ are normally distributed and their covariance is zero does not
make them independent
. That would only be true if they were bivariately
normally distributed.

My Questions: Is what I'm being asked to show even true? That is, is there something about $\hat\beta_0$ and $\hat\beta_1$ being OLS estimators that makes this result true? Or can I show that they are bivariate normal distributed? Zero covariance does not imply independence in general; why should it be so in this situation?

Note 1: in silverfish's answer to this question, it is mentioned in the paragraph beginning with "These two uncertainties apply independently…" that these two uncertainties "…should be technically independent." But it is not proven there, though it is intuitively explained and I could believe it.

Note 2: In this thread, Alecos simply makes the argument that I think the book wants here, but doesn't say anything about why zero covariance implies independence.

Note 3: I have reviewed a few other threads related to this, but none of them answers the main question of why zero covariance should imply independence in this situation, when it doesn't in general.

Best Answer

$\newcommand{\one}{\mathbf 1}\newcommand{\e}{\varepsilon}$I would just go for a linear algebra approach since then we get joint normality easily. You have $y = X\beta + \e$ with $X = (\one \mid x)$ and $\e\sim\mathcal N(\mathbf 0, \sigma^2 I)$.

We know $$ \hat\beta = (X^TX)^{-1}X^Ty \sim \mathcal N(\beta, \sigma^2 (X^TX)^{-1}) $$ where $$ (X^TX)^{-1} = \begin{bmatrix} n & n \bar x \\ n \bar x & x^Tx\end{bmatrix}^{-1} = \frac{1}{x^Tx - n\bar x^2}\begin{bmatrix} x^Tx/n & - \bar x \\ - \bar x & 1\end{bmatrix}. $$ By assumption $X$ is full rank, which in this case means $x$ is not constant (since the only way to be low rank is for $x$ to be in the span of $\one$). This means $\det X^TX \neq 0$, so $\text{Cov}(\hat\beta_0, \hat\beta_1) = 0$ if and only if $\bar x = 0$ and we do indeed have bivariate normality so this is equivalent to independence.


Here's a different approach that avoids using the normal equations. We know $$ \hat\beta_0 = \bar y - \hat\beta_1 \bar x \\ \hat\beta_1 = \frac{\text{Cov}(x,y)}{\text{Var}(x)} $$ and we want to show $\bar x = 0 \implies \hat\beta_0 \perp \hat\beta_1$, where I'm using "$\perp$" to denote independence.

Without losing any generality I'll assume $x^Tx = 1$ (this preserves $\bar x = 0$). Then under the assumption of $\bar x = 0$ we have $$ \hat\beta_0 = \bar y = n^{-1}\one^Ty \\ \hat\beta_1 = x^Ty - \bar y x^T\one = x^Ty. $$

This means $$ {\hat\beta_0 \choose \hat\beta_1} = (n^{-1}\one \mid x)^Ty $$ so this is a linear transformation of a Gaussian and is in turn Gaussian, and the covariance matrix is proportional to $$ (n^{-1}\one \mid x)^T(n^{-1}\one \mid x) = \begin{bmatrix} n^{-1} & 0 \\ 0 & 1\end{bmatrix} $$ which gives us independence.


This result can be generalized by noting that $\bar x = 0$ is equivalent to having an orthogonal design matrix in this case.

Suppose now we have an $n\times p$ full column rank covariate matrix $X$ which is partitioned as $X = (Z\mid W)$ where $Z$ has orthonormal columns and $W$ is unconstrained.

If every column is orthogonal, i.e. $X=Z$, the result is easy as $X^TX = I$ so $$ \hat\beta \sim \mathcal N(\beta, \sigma^2I). $$

I'll prove the following more interesting result: letting $\hat\beta_A$ denote the vector of coefficients for block $A$ of $X$, the elements of $\hat\beta_Z$ are conditionally independent given $\hat\beta_W$.

This can be shown by directly computing the covariance matrix of $\hat\beta_Z \mid \hat\beta_W$ and since $\hat\beta_Z\mid\hat\beta_W$ is still multivariate Gaussian, this gives us independence. I'll take $\sigma^2 = 1$ without losing any generality.

I'll start with the full covariance matrix of $\hat\beta$, which is proportional to $(X^TX)^{-1}$. $X^TX$ is a $2\times 2$ block matrix so we can invert it as $$ (X^TX)^{-1} = \begin{bmatrix}I & Z^TW \\ W^TZ & W^TW\end{bmatrix}^{-1} = \begin{bmatrix} I + Z^TWA^{-1}W^TZ & -Z^TWA^{-1} \\ -A^{-1}W^TZ & A^{-1} \end{bmatrix} $$ where $A = W^TW - W^TZZ^TW = W^T(I-ZZ^T)W$ gives the covariance matrix of $W$ after projecting all columns into the space orthogonal to the column space of $Z$.

It is not true in general that $I + Z^TWA^{-1}W^TZ = I$, so marginally we are not guaranteed independence in the $\hat\beta_Z$. But now if we condition $\hat\beta_Z$ on $\hat\beta_W$ we obtain $$ \text{Var}(\hat\beta_Z \mid \hat\beta_W) = I + Z^TWA^{-1}W^TZ - Z^TWA^{-1} \cdot A \cdot A^{-1}W^TZ = I $$ so we do indeed have conditional independence.

$\square$