*I scanned through several posts on a similar topic, but only found intuitive explanations (no proof-based explanations).
Let's say I have two models, the first of which represents the true data, $y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \varepsilon$, where $X_1$ and $X_2$ are fixed regressors, and the second of which represents the reduced version, $y = \beta_0 + \beta_1X_1 + \varepsilon$. The second model gives us $\hat{\beta_1}$. Will $\hat{\beta_1}$ be a biased estimator for $\beta_1$?
My first instinct is that it will only be a biased estimator if $X_2$ was a predictor (correlated with $X_1$ and $\beta_2 \ne 0$).
I found mixed ways of going about this, but this is the best I came up with.
$\hat{\beta_1}$ = $\sum_{i=1}^{n} = \frac{(x_i-\bar{x})(y_i-\bar{y})}{(x_i-\bar{x})^2}$ = $\frac{\sum_{i=1}^{n}(x_i-\bar{x})*y_i}{\sum_{i=1}^{n}(x_i-\bar{x})^2}$.
$E(\hat{\beta_1})$ = $\frac{\sum_{i=1}^{n}(x_i-\bar{x})}{\sum_{i=1}^{n}}E(y_i)$
= $\frac{\sum_{i=1}^{n}(x_i-\bar{x})}{\sum_{i=1}^{n}(x_i-\bar{x})^2}\beta_0+ \beta_1\frac{\sum_{i=1}^{n}(x_i-\bar{x})}{\sum_{i=1}^{n}(x_i-\bar{x})^2}$.
Does this sufficiently prove that it is unbiased for $\beta_1$?
Best Answer
We need to take some care with the notation because the models differ.
Let the first (correct) model be
$$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon\tag{1}$$
where the $\varepsilon_i$ have a common variance and zero means; and write the second model (which governs the very same variables $Y$, so no need to change their name) as
$$Y = \alpha_0 + \alpha_1 X_1 + \delta.\tag{2}$$
As an aside, we may impose no additional assumptions on $\delta$ because these random variables are completely determined by equating the two right hand sides (which, after all, equal the same things):
$$\delta = (\beta_0 - \alpha_0) + (\beta_1 - \alpha_1)X_1 + \beta_2 X_2 + \varepsilon.$$
(From now on I will drop the generic discussion of models to focus on a dataset with explanatory values $x_{1i}$ and $x_{2i},$ responses $y_i,$ and associated error $\varepsilon_i$ and $\delta_i.$)
We can infer, however, that the $\delta_i$ all have the same variances as the $\varepsilon$ and their means are
$$E[\delta_i] = (\beta_0 - \alpha_0) + (\beta_1 - \alpha_1)x_{1i} + \beta_2 x_{2i},$$
which may vary among observations.
Let's return to the analysis. Fitting the second model gives the slope estimate
$$\hat\alpha_1 = \frac{\sum_{i} (y_i - \bar y)(x_{1i} - \bar{x}_1)}{\sum_{i} (x_{1i} - \bar{x}_1)^2}.\tag{*}$$
This is a linear combination of the $y_i-\bar y,$ so use the zero-mean assumption about the $\varepsilon_i$ to compute
$$E[y_i - \bar y] = (\beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i}) -(\beta_0 + \beta_1 \bar{x}_1 + \beta_2 \bar{x}_2) = \beta_1(x_{1i}-\bar{x}_i) + \beta_2(x_{2i} - \bar{x}_2)$$
and apply linearity of expectation in $(*)$ to compute
Equating this with $\beta_1$ to assess the bias in using $\hat\alpha_1$ to estimate $\beta_1,$ we find it will be unbiased if and only if the second term is zero. This can happen in two ways:
If $\beta_2 = 0.$ (This just means the second model is correct.)
If $\sum_{i} (x_{2i}-\bar{x}_2)(x_{1i} - \bar{x}_1)=0.$ This means the covariance of the $x_1$ data and the $x_2$ data is zero: that is, the design vectors are orthogonal.
If neither of these is the case, the bias is nonzero. That agrees exactly with your intuition.