Solved – Linear Regression – Conditions for unbiased estimate

estimatorslinearlinear modelregressionunbiased-estimator

When is the linear regression estimate of $\beta_1$ in the model

$$ Y= X_1\beta_1 + \delta$$

unbiased, given that the $(x,y)$ pairs are generated with the following model?

$$ Y= X_1\beta_1 + X_2\beta_2 + \delta$$

We have that the expected value of $\beta_1$ is

\begin{align*}
E[\hat{\beta}_1|X_1,X_2) &= E[(X_1^TX_1)^{-1}X_1^T(X_1\beta_1+X_2\beta_2+\delta)|X_1,X_2]\\
&=\beta_1 + E[(X_1^TX_1)^{-1}X_1^TX_2\beta_2+(X_1^TX_1)^{-1}X_1^T\delta|X1,X2]\\
&= \beta_1+E[(X_1^TX_1)^{-1}X_1^TX_2\beta_2 | X_1,X_2] + 0\\
\end{align*}

Now, when is the second term 0 (i.e., $\hat{\beta}_1$ is an unbiased estimator)? I have read that it is 0 if $X_1$ and $X_2$ are independent.

But which property allows me to conclude that?

Best Answer

It is zero when the columns of $\mathbf{X}_1$ are perpendicular to the columns of $\mathbf{X}_2$ so that the column spaces are orthogonal to one another. This means that the variables need to be uncorrelated to one another, which is not quite the same thing as independence. It is in fact a weaker condition as independence implies zero correlation.

If the variables are not uncorrelated, however, and you proceed to estimate $\boldsymbol{\beta}_1$ only, you will end up with a biased estimator. In fact, this is called omitted variable bias.

Related Question