Solved – Linear regression and assumptions about response variable

assumptionsgeneralized linear modelregression

Wikipedia states:

Ordinary linear regression predicts the expected value of a given unknown quantity (the response variable, a random variable) as a
linear combination of a set of observed values (predictors). This
implies that a constant change in a predictor leads to a constant
change in the response variable (i.e. a linear-response model). This
is appropriate when the response variable has a normal distribution.

So, Wikipedia makes an assumption about the response variable, namely that it is normally distributed.
However, in other sources and here in stack exchange the normality is required for the error terms. If they are not normally distributed we should go for some generalized linear model.

What is the Wikipedia article referring to or is it wrong?

Best Answer

The Wikipedia statement

This is appropriate when the response variable has a normal distribution.

is wrong.

OLS does NOT have assumptions on response variable. But has assumptions on residual (See Gauss–Markov theorem). Also see this post for details.

Why linear regression has assumption on residual but generalized linear model has assumptions on response?

I am stealing @Cliff AB 's example here. The following distribution on $y$ and residual does not violate OLS assumption!

enter image description here

Related posts:

What is a complete list of the usual assumptions for linear regression?

How does linear regression use the normal distribution?

What if residuals are normally distributed, but y is not?