[Math] What’s the variance of intercept estimator in multiple linear regression

linear regression

Suppose a linear regression model
$Y=Xβ+ε$
where $X$ is an $n$-by-$(k+1)$ matrix and $\epsilon$ follows $N(0,\sigma^2I_n)$. $k$ is the number of explanatory variables. The first column of $X$ is one (intercept).

Or we can write in this form: $Y=β_0+β_1X_1+…+β_kX_k+\epsilon$

I learned from the book "Introductory Econometrics – Wooldridge" that the variance of $\hat\beta_j$ is

$\operatorname{var}(\hat\beta_j)=\sigma^2/SST_j/(1-R_j^2)$

This only holds for $j=1,…,k$. But for $\hat\beta_0$, $SST_0=0$ and it is not valid.

I have already known that $\operatorname{var}(\hat\beta)=\sigma^2(X'X)^{-1}$ but that doesn't give an explicit formula of $\operatorname{var}(\hat\beta_0)$. Is there any clear way just like what we have on $\operatorname{var}(\hat\beta_j)$?

I also know that if $k=1$ then $\operatorname{var}(\hat\beta_0)=\sigma^2\sum x_i^2/SST_x/n$. Is there any similar result when $k>1$?

QUESTION: What is the variance of intercept estimator?

$\operatorname{var}(\hat\beta_0)=?$

Best Answer

Usually your $X$ will look like this \begin{equation} X = \begin{bmatrix} \mathbf{1} & X_1 & \ldots & X_{k-1} \end{bmatrix} \end{equation} where $\mathbf{1}$ is an all-ones vectors of size $N$ and $X_{j}$ is the $(j+1)^{th}$ column in $X$ Then \begin{equation} X^T X = \begin{bmatrix} \mathbf{1}^T\mathbf{1} & \bar{y}^T \\ \bar{y} & Y^T T \end{bmatrix} \end{equation} Notice that we have a block matrix here. Also notice that $\mathbf{1}^T\mathbf{1} = N$, with \begin{equation} \bar{y} = \begin{bmatrix} \sum X_{1i} \\ \vdots \\ \sum X_{Ni} \\ \end{bmatrix} \end{equation} an $Y$ is the matrix $X$ with first column being omitted. \begin{equation} X^T X = \begin{bmatrix} N & \bar{y}^T \\ \bar{y} & Y^T T \end{bmatrix} \end{equation} Using block matrix inversion, we get that the first element in the first row of $(X^T X)^{-1}$ is \begin{equation} [(X^T X)^{-1}]_{1,1} = N^{-1} + N^{-1} \bar{y}^T(Y^T Y - \bar{y} N^{-1} \bar{y}^T )^{-1}\bar{y}N^{-1} \end{equation} Since $N$ is a scalar this simplifies to \begin{equation} [(X^T X)^{-1}]_{1,1} = \frac{1}{N} + \frac{1}{N^2} \bar{y}^T(Y^T Y - \frac{1}{N}\bar{y} \bar{y}^T )^{-1}\bar{y} \tag{1} \end{equation} But \begin{equation} \operatorname{var} (\hat{\beta}_0) = \sigma^2 [(X^T X)^{-1}]_{1,1} \tag{2} \end{equation} So using $(1)$ in $(2)$ we get \begin{equation} \operatorname{var} (\hat{\beta}_0) = \frac{\sigma^2}{N} + \frac{\sigma^2}{N^2} \bar{y}^T(Y^T Y - \frac{1}{N}\bar{y} \bar{y}^T )^{-1}\bar{y} \end{equation}

Related Question