Solved – the difference between Stochastic Regressor and Non-Stochastic Regressor in Linear Regression

linear modelregression

Suppose the regression specification is $$y_i=\beta_0+\beta_1x_i+\epsilon_i,$$
No matter $x_i$ is stochastic or not, we will need the assumption that $\epsilon_i$ is distributed the same for all $i$. However, if $x_i$ is a stochastic random variable rather than a fixed-value, another assumption is needed, namely the disturbance term has zero conditional expectation; in other words, $\epsilon_i$ is distributed independent of $x_i$.

My question is how does this assumption even make a difference in practice? I feel like in practice, there is no way to assess whether $\epsilon_i$ is distributed independent or dependent of $x_i$ since we only have one observation of $(x_i,y_i)$ for each $i$.

Best Answer

In practice the difference is huge. The exogenous assumption that you refer to requires that the errors are not correlated with regressors. If they're correlated then you can't rely on the regressions with stochastic regressors.

For instance, in observational studies, such as pretty much all economics, you do not control the regressors. You can not set US GDP to a desired level, you can only observe it. Hence, in the model where GDP is a regressor, you want errors to be independent of GDP, because in this model you can only assume stochastic regressors.

When your errors are correlated with regressors you get endogeneity issue. There are ways to handle it, such as using lagged regressors or instrumental variables.

In econometrics a textbook example is the impact of the exogenous price on the demand. We're talking about typical demand-supply equations. Here, the problem is that the prices also depend on the supply. Hence, there is an endogeneity issue, which any econometrician will promptly point out. This is to answer your question on feasibility of testing the assumption.

Once you figured that endogeneity is here, you may look for a so called instrumental variable. These are regressors which are correlated with the price but not with demand, i.e. something that may impact the supply, for instance. If the demand is for oranges, then maybe a temperature in Florida in Spring would be a suitable instrument, because it's going to impact supply of oranges - and price - but not the demand. So, you plug this instrument into the regression and tease out the impact of the price on demand

Related Question