Solved – Linear regression with strongly non-normal response variable


I have carried out a linear regression. The plot below shows the distribution of the response variable:

enter image description here

I believe the response variable is beta distributed, therefore virtually the exact opposite of normally distributed. However, when including all my predictor variables in the linear regression, the residuals turn out to be quite normally distributed, as shown in this plot:

enter image description here

Has my model satisfied the assumptions of linear regression? Might there be a better model to use?

Best Answer

The distribution of the response is irrelevant. Inference based on small samples requires the errors to be approximately normal (better look at the QQ-plot of the residuals than at its density because the tails are important). If you are only interested in descriptive results or if the sample size is not too small, you therefore do not need to worry about normality.

Much more important are the other assumptions of linear regression (correct model structure, no large outliers in the predictors and, if you are interested in inference, homoscedastic and uncorrelated errors).