Solved – Violation of Gauss-Markov assumptions

assumptionsregressionself-study

Which of the Gauss-Markov assumptions is violated in this picture?

If all other Gauss-Markov assumptions are satisfied, is the OLS estimator for $\beta_1$ unbiased and consistent? Why?

diagramm u and income

In the diagram, u is the error term, Einkommen is income (an explanatory variable).

The model is specified as follows:

$y = \beta_0 + \beta_1 \text{einkommen} + u$

The problem is taken from an exam.

My thoughts (x is einkommen):

the figure shows a quadratic function

the Gauss-Markov assumptions are:
(1) linearity in parameters
(2) random sampling
(3) sampling variation of x (not all the same values)
(4) zero conditional mean E(u|x)=0
(5) homoskedasticity

I think (4) is satisfied, because there are residuals above and below 0

(5) is satisfied, since the variation seems to be constant over all x
(3) satisfied , since einkommen is not the same value for all observations
(2) random sampling is satisfied, dont ask me why.
so only (1) is left, the model is not linear in parameters.

I hope I am not totally wrong with my thoughts.

Best Answer

There've been a couple answers and none of em have touched on what I thought were the most interesting questions asked, the bias and consistency of misspecified linear models. Since it seems pretty clear from the residuals that the model is misspecified with a quadratic term, let's take a look at what happens to our estimates. I'll leave this in terms of a general misspecification instead of solely a quadratic one for funsies.

Suppose we know an oracle who tells us the generating process for the data is $Y=X \beta +Z \alpha +\epsilon$. However, the model we choose to fit is $Y=X \beta+\epsilon$. Take note that the true model contains extra data in the form of Z and extra parameters in the form of the $\alpha $ term. Now, we could think of Z as being data we were unable to or chose not to collect but we could also think of the Z term as being data we collected and chose not to include in our model (like the situation you are in).

Now the typical parameter estimate is $ \hat{ \beta}=(X^{T}X)^{-1}X^{T}Y$. Biasedness relates to the expectation of our estimate and if we want to have consistency, we need that our bias disappears asymptotically. Keeping that in mind, we look at our expectation: $ E [\hat{ \beta}]=(X^{T}X)^{-1}X^{T}E [Y] = \beta +(X^{T}X)^{-1}X^{T}Z \alpha $.

So, if we misspecify, and alpha is not a column of 0's we end up with estimates which will certainly be biased by a factor of $(X^{T}X)^{-1}X^{T}Z \alpha $. Likewise, since consistency depends on asymptotic unbiasedness and our bias term has no reason to disappear asymptotically, we can expect the parameter estimates to fail to be consistent well.

Related Question