Solved – Outlier detection and normality assumption

normality-assumptionoutliersself-studystatistical significance

So I am following this applied statistics class, and we were taught 5/6 tests for outlier detection and normality test, then told to apply these to some datasets. I am quite confused on how to go for it.

I plotted an histogram of each dataset, and noticed that some of them were clearly not normal, so I rejected their normality with the test, and skipped outlier detection (as the tests presume the population is normally distributed), so far so good.

The problem comes when the histogram looks approximately normal. How can I know whether normality is a valid assumption or not? I could apply the test, but its result could be invalid due to possible outliers in the data, but to know if there are outliers I need normality, but… It's a serpent biting its tail.

I should then start from something, e.g. if I want to use a test for outliers then I can filter out the datasets that are not normally distributed by looking at the histogram. But that is subjective, so what is the point of applying a "precise" outlier test after cherry picking the datasets? To me, it has the same value as directly detecting outliers by looking at the histogram.

What is a sound way to proceed?

Best Answer

I think these 2 previously answered questions will be very useful for you as you ponder testing for normality:

(1) how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sample

(2) is-normality-testing-essentially-useless

This answer has some especially useful info about test selection based on your N and what the tails look like of the distribution.

I would not recommend basing test selection on the results of a normality test. I would look at the data, think about your expected distribution based on the type of data you have, and follow the advice from above. A QQ plot can be helpful for 'eye-balling' if the data are approximately normal and test if a simple transformation of the data could help.

Related Question