Solved – Normality test with large data set

hypothesis testingkolmogorov-smirnov testnormality-assumption

I think I am dealing with this issue
http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/

I have large data set(10k data points) that slightly diverges from normal, and I get p-value of 0. I am interested in having perhaps a more crude test that tells me if data is extremely divergent from normal, versus looking somewhat normally distributed. I am currently trying Kolmogorov-Smirnoff, and in both cases I just get p-value of 0. Any alternatives?

I also looked at this:
Is normality testing 'essentially useless'?

So, taking all this into consideration, is there any kind of test I can perform that distinguishes between data that's roughly normal, and not normally distributed at all?

I am using scipy.

Best Answer

a more crude test that tells me if data is extremely divergent from normal

You very likely don't want hypothesis tests at all, since they don't answer that question.

You're trying to answer a question related to "effect size" ("how far from normal is it?" is an effect-size type question) but hypothesis tests don't answer that question.

Some goodness of fit statistics (as whuber suggested in comments) do give a measure of "how far from normal"..., as does looking at say a Q-Q plot, like Nick Cox mentioned, but I bet you have a more specific question than would not necessarily be well answered by using some measure essentially at random.

For example, you might perhaps have an underlying question nearer to "How badly could this particular kind of inference I wish to use be affected by the kind of (and degree of) non-normality of the distribution from which my data were drawn?"

[The answer to that depends on what you're trying to do! Different forms of inference have may be sensitive to different aspects of the distribution, and to differing degrees. More detail would be necessary before useful advice could be given for a question like that.]

You may have some other kind of question of course, but how to measure impact on whatever you're trying to do will again depend on what you're trying to achieve.

Best Answer

Related Solutions

Solved – Disagreement between normality tests and histogram graphs

Solved – Normality of dependent variable = normality of residuals

Related Question