Hypothesis Testing – Should Groups or Whole Sample Be Normally Distributed for Normality Assumption?

hypothesis testingnormality-assumption

It is a general and maybe basic question, but it will help me to avoid a mistake. To test the dependence of continuous variable on nominal with parametric test (t-test, ANOVA), "the data has to be normally distributed" (distribution have to be not much violated from normal).

But I don't clearly understand: I have to test the normality of the continuous data together (whole sample) or the normality of data in every single group?

And if my groups have small sizes (n=4-6), and therefore I fail to reject null, does this indicate a permission to use parametric methods?

Can anyone provide me a link to some good examples of statistical (biomedical) hypothesis testing with R (google did not helped)?

Best Answer

Generally it is the residuals that need to be normally distributed. This implies that each group is normally distributed, but you can do the diagnostics on the residuals (values minus group mean) as a whole rather than group by group. It is possible (and even common) that the data will be approximately normal within each group, but since the group means differ the overall dataset will be quite non-normal, but you can still use normal theory tests for this case.

Note that the real question is not "exactly normal", but rather "normal enough for the given problem". With small datasets the question of normality is the most important, but you have low power to detect non-normality (unless it is very extreeme), with large datasets the Central Limit Theorem kicks in so your data does not need to be that normal, but you have high power to detect small departures from normality. So when doing formal tests of normality as a condition for doing t-tests or anova you are either in the situation where you have a meaningless answer to a meaningful question, or you have a meaningful answer to a meaningless question (there may be some middle size where both are meaningful, but I expect that the middle range is really where both are meaningless).

So, no just because a small sample size does not reject the null does not mean that it is safe to use normal theory methods. Knowledge about the source of the data and some diagnostic plots are likely to be more useful in that decision, or if you are worried about non-normality just go straight to the non-parametric tests.

If you really feel the need for a p-value testing exact normality then you can use the SnowsPenultimateNormalityTest function in the TeachingDemos package for R (but be sure to read the help page).

Another option for testing "normal enough" if you need more than the diagnostic plots is to use the methodology in:

 Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne,
 D.F and Wickham, H. (2009) Statistical Inference for exploratory
 data analysis and model diagnostics Phil. Trans. R. Soc. A 2009
 367, 4361-4383 doi: 10.1098/rsta.2009.0120

(the vis.test function in the TeachingDemos package of R is one implementation of this).

The impartant thing to take away is that knowledge about the process that produced your data is much more important than the output from some program/algorythm written by someone who knows/knew much less about your data and question than you do.