Solved – random error in OLS regression? And how is it related to Gaussian noise

errorleast squaresregression

In OLS regression:

$$Y=\beta_0+\beta_1 X_1+ \beta_2 X_2+\beta_3 X_3 + \beta_4 X_4+\beta_5 X_5+\beta_6 X_6 + \varepsilon,$$

what is $\varepsilon$? Is it Gaussian noise or random error? What is a difference? Why we add it to multiple regression model? In most of papers authors refer it to random error but without clarification.

I need a simple and good reason why authors add it to their model.

Best Answer

The wikipedia definition is a fine definition that you can use for your paper if you need one but I think you're missing something.

The $\epsilon$ is random error, which is synonymous with noise. In practice, the random error can be Gaussian distributed, in which case it is Gaussian noise, but it could take on other distributions. If the distribution of $\epsilon$ happens to be Gaussian then you've met one of the theoretical assumptions of the model and things like interval estimation are better justified. If it's not Gaussian then, like Glen_b said, you still have that it's best linear unbiased.

Theoretically, the random error (noise) is supposed to be Gaussian distributed but the outcome could be anything. So, in order to answer your question you'd need to state whether you want to know the distribution of your particular noise or what the distribution of the noise should be. For the former you'd need data.

Related Question