Solved – Normality of residuals in a regression model with a categorical IV

assumptionsrregression

I have ran a simple regression with a continuous response variable and a categorical explanatory variable (with 2 levels). I am currently checking that the model meets the assumptions of regression. I produced the following plot:

enter image description here

I'm aware that I need to check that the residuals are normally distributed. Do I need to check the distribution of residuals at each of the 2 levels of the explanatory variable? Or do I need to check the distribution of all residuals simultaneously?

Best Answer

(Note that a regression model with only 1 explanatory variable that is categorical and has just 2 levels is equivalent to a t-test; there's nothing wrong with calling it a regression, but it would most commonly be discussed / referred to as a t-test.)

You check the distribution of all the residuals simultaneously. There are tests for normality, but I'm not a huge fan of them (I listed some in my answer to your previous question). I think the best option is to make a qq-plot. You can find a really nice version (qq.plot) in John Fox's car package. Among other features, it'll give you a 95% confidence band, which can help you interpret the plot.

On a different note, from looking at your plot, I don't know if you have more data in the second group, but you should also check to ensure you have homogeneity of variance.