Linear Regression – Understanding Why Error Properties in Linear Regression Are Assumed True by Construction

assumptionslinear modelregressionresiduals

The following two results on the residuals ($\epsilon$) in the case of linear regression get stated as assumptions of the linear regressions

$E(\epsilon) = 0$
$cov(X, \epsilon) = 0$

Here is MIT 18.650 professor Philippe Rigollet stating that these are assumptions

However, both of these are just results of fitting a least squares line through any data. Absolutely no assumptions there. You fit $Y = b + aX + \epsilon$ and minimize $(Y – (b + aX))^2$ and the two above stated result fall out naturally

So, how are these assumptions and not necessary results that the residuals ($\epsilon$) would follow when the optimal line is found?

Best Answer

Distinctions

First let us differentiate two levels.

The true model. We can also call this the data generating process or a structural model. It is structural in the sense that it reflects how each variable is structured or generated.
The fitted model. This is independent of how the data came to be generated. This is just a set of rules that receives data as input and gives some statistics as output.

Example

Consider a typical scenario where we have the effect of $X$ on $Y$ but also the presence of an unobserved confounder $U$. So we have:

$X \longrightarrow Y$, and
$X \longleftarrow U \longrightarrow Y$

Assume for simplicity that all relationships are linear and additive. We could then explicitly write down the true model as:

$$ Y_i = \beta_0 + \beta_1 X_i + \beta_2 U_i + \epsilon_i $$

What tends to be common in econometrics, is not to separate the unobserved confounder from the error term, so they would write the true model as

$$ Y_i = \beta_0 + \beta_1 X_i + \eta_i $$

Where $\eta_i = \beta_2 U_i + \epsilon_i$ for simplicity. Note the regression coefficients are equal only in this simple linear case. So given the true generating process we described (and some additional graphical assumptions), here we know that $\eta$ and $X$ are associated, since $U$ is a cause of $X$ and $U$ is part of $\eta$. Econometricians will say that $X$ is endogenous (in the true model), because $\operatorname{Cov}[X,\eta] \neq 0$. This is why they also say an omitted variable is a source of endogeneity.

Question

You ask

So, how are these assumptions and not necessary results that the residuals ($\epsilon$) would follow when the optimal line is found?

Let's say you fit the model you describe, then you get something like:

$Y_i = \hat{\beta_0} + \hat{\beta_1} X_i + \delta_i$

As you say, by construction due to OLS, $\operatorname{Cov}[X, \delta] = 0$. But, here is the crucial part, here your $\hat{\beta_1}$ is not an estimator for the true $\beta_1$, which would be the causal or structural parameter you care. Because it is assuming things that are not true in the structural model.

In short, what your teacher leaves implicit is that we assume the fitted or empirical model is correctly specified. And this assumption entails that the consequences of our estimating strategy, e.g. OLS, should also apply to the true model, i.e. $\mathbb{E}[\eta] = 0$ and $\operatorname{Cov}[\eta, X] = 0$, which is not the case here.

This is directly related to the point brought by markowitz and the ambiguity existent in Econometrics textbooks relative to these two levels.

Related Solutions

Residuals vs Fitted Values Plot – Diagonal Straight Lines in Multiple Regression

It seems that on some its subrange your dependent variable is constant or is exactly linearly dependent on the predictor(s). Let's have two correlated variables, X and Y (Y is dependent). The scatterplot is on the left.

enter image description here

Let's return, as example, on the first ("constant") possibility. Recode all Y values from lowest to -0.5 to a single value -1 (see picture in the centre). Regress Y on X and plot residuals scatter, that is, rotate the central picture so that the prediction line is horizontal now. Does it resemble your picture?

Linear Regression – How Data Violating Assumptions Appears

Here is an example where the variance of $\varepsilon$ is not constant (the variances of the residuals are larger for larger $x$):

    set.seed(2021)
    x1 <- 1:100
    epsilon1 <- rnorm(100, 0, x)
    y1 <- 3*x1 + 100 + epsilon1 
    plot(x1, y1)
    abline(lm(y1 ~ x1))

and an example where $\varepsilon$ is not normally distributed (and so the residuals are not normally distributed):

    set.seed(2021)
    x2 <- 1:100
    epsilon2 <- 100 * (rbinom(100, 1, 1/2) - 1/2)
    y2 <- 3*x2 + 100 + epsilon2 
    plot(x2, y2)
    abline(lm(y2 ~ x2))