Solved – Difference between normality of residuals vs normality in each group.

anovagroup-differencesnormality-assumptionresiduals

I've been searching on the internet and books for quite a long time now and have come to the conclusion that normality of the residuals (each values minus the mean of each group) is the same as looking for normality of the distribution of y in each group separately. ( Normality of dependent variable = normality of residuals? )
However, when I am doing both og these methods with my dataset, the method with the residuals give me a very different answer than the method for normality of Y in each group.
For instance, I find with the residual method (only one graph) that the age and height are not normally distributed but that the waist circumference is. Looking at the distribution of Y in each group I find that the age and height are normally distributed in each of the 4 groups but that the waist circumference is only normally distributed in group 1, 2 and 4. Not in group 3.
How do I interpret these results? Which method should I use? I want to use ANOVA and p-values so I need my data to be normally distributed. I have 4 groups with 20 datapoint in each group.

Best Answer

Not knowing which methods you used to test for the normality of the residuals and the dependent variable, respectively, it's difficult for me to give you an exact answer. However, I assume that you used a visual comparison or some kind of significance test to check for normality.

Since you mentioned that you only have 20 datapoints per group, I think that the problem lies with the sample size. If you use our standard "off-the-shelf" Frequentist test to assess the normality of a group. For example, if you use the Shapiro-Wilk test to check for normality, you are essentially comparing your (standardised and ordered) sample with an ordered sample drawn from a standard normal distribution. If your sample deviates too much from the standard normal distribution, the difference is deemed "significant" (for example on the 0.05 level), giving you a hint that your sample should not be regarded as normally distributed.

But the Shapiro-Wilk test, like most normality test, is highly susceptible to changes in the sample size. If your sample size is too low, it is very hard to detect a difference to a normally distributed sample so most of the test results will be non-significant. If you increase the sample size, however, even small deviations from the normal distributions will turn out to be "significant".

This is probably what happened in your case. When you were using the residuals to test for normality, you had a total of 420 = 80 data points for each variable (height, weight & waist circumference), with the result that two out of three tests turned out to be significant. When you were using the data points within each group, you conducted more tests in total (43 = 12 tests for each variable in each group), but due to the low sample size in each test only 1 out of 12 tests turned out to be significant (waist circumference in group 3)

I hope that helped to clear things up a bit. In order to give you any meaningful recommendation on which methods to use, you need to present more information on your data set and the exact tests you used.

Related Question