Regression – Understanding Zero Conditional Expectation of Error in OLS Regression

least squaresregression

Suppose we have a dependent variable $Y$ and an independent variable $X$ in a population, and we want to estimate the linear model
$$
Y = \beta_{0} + \beta_{1}X + \varepsilon
$$

Using the least-squares method, we obtain estimates $\hat{\beta_{0}}$ and $\hat{\beta_{1}}$, and so in a sample of this population, we have for each $i$ in the sample
$$
y_{i} = \hat{\beta_{0}} + \hat{\beta_{1}}x_{i} + e_{i}
$$

where $e_{i}$ is the residual associated with observation $i$. Now, one essential assumption here is that the conditional distribution of $e_{i}$ given an $X$ is normal, and
$$
\mathbb{E}(e_{i}|X) = 0
$$

I don't fully understand how $e_{i}$ can be looked at as a random variable given an $X$. What precisely is the random variable $e_{i}$, i.e. what different values can it take on? Given estimates $\hat{\beta_{0}}$ and $\hat{\beta_{1}}$ and a value $X$, it seems to me that the $e_{i}$ just take on a finite number of fixed values (could even be 1); so in what sense is it looked at as a random variable?

Alternatively, does the "randomness" in $e_{i}$ come because we consider the error terms associated with different estimates of the regression coefficients? In other words, does the zero conditional expectation of errors mean that given an $X = x$, if we picked different samples of the population containing $x$ and estimated the least squares line for each of these samples, the error associated with $x$ should, on average, be zero?

Best Answer

Residuals, defined given the regressors, remain random variables simply because, even if the regressors are given, is not possible to reduce them to constants. In other words if you have $x_i$ you can obtain, given estimated coefficients, the predicted values of $y$ but this prediction maintain its uncertainty.

However you have right that the residual values are linked to the estimated coefficients.

Now you have to note that the condition you wrote $E[e_i|X]=0$ is wrong because is written on residuals. I fear that you conflate the meaning of residuals and errors. This problem is widely spread and very dangerous.

Following your notation the condition should be $E[\epsilon_i|X]=0$ and its make sense only if we interpret the true model as structural equation and not as something like population regression (you speak about linear model in your question, too general and ambiguous name frequently used). Misunderstanding like those have produced many problems among students and in literature also.

Those posts can help you and other readers:

What is the actual definition of endogeneity?

Does homoscedasticity imply that the regressor variables and the errors are uncorrelated?

Endogeneity testing using correlation test

Regression's population parameters