Kolmogorov-Smirnov Test – 2 Sample Kolmogorov-Smirnov vs. Anderson-Darling vs Cramer-von-Mises

anderson darling testkolmogorov-smirnov testtwo-sample

I was wondering what are the criteria to use Kolmogorov-Smirnov, Cramer-von-Mises, and Anderson-Darling when comparing 2 ECDFS. I know the mathematics of how each differ, but if I have some ECDF data, how would I know which test is appropriate to use?

Best Answer

To cut a long story short: Anderson-Darling test is assumed to be more powerful than Kolmogorov-Smirnov test.

Have a glance on this article comparing various tests (of normality, but the results hold for comparing two distribudions) Power Comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling Tests by Nornadiah Mohd Razali & Yap Bee Wah.

Anderson-Darling test is much more sensitive to the tails of distribution, whereas Kolmogorov-Smirnov test is more aware of the center of distribution.

To sum up, I would recommend you to use Anderson-Darling or eventually Cramer-von Misses test, to get much more powerful test.

Related Solutions

Goodness-of-Fit – Anderson-Darling Test vs Cramér-von Mises Criterion Explained

There can be no single state-of-the-art for goodness of fit (for example no UMP test across general alternatives will exist, and really nothing even comes close -- even highly regarded omnibus tests have terrible power in some situations).

In general when selecting a test statistic you choose the kinds of deviation that it's most important to detect and use a test statistic that is good at that job. Some tests do very well at a wide variety of interesting alternatives, making them decent default choices, but that doesn't make them "state of the art".

The Anderson Darling is still very popular, and with good reason. The Cramer-von Mises test is much less used these days (to my surprise because it's usually better than the Kolmogorov-Smirnov, but simpler than the Anderson-Darling -- and often has better power than it on differences "in the middle" of the distribution)

All of these tests suffer from bias against some kinds of alternatives, and it's easy to find cases where the Anderson-Darling does much worse (terribly, really) than the other tests. (As I suggest, it's more 'horses for courses' than one test to rule them all). There's often little consideration given to this issue (what's best at picking up the deviations that matter the most to me?), unfortunately.

You may find some value in some of these posts:

Is Shapiro–Wilk the best normality test? Why might it be better than other tests like Anderson-Darling?

2 Sample Kolmogorov-Smirnov vs. Anderson-Darling vs Cramer-von-Mises (about two-sample tests but many of the statements carry over

Motivation for Kolmogorov distance between distributions (more theoretical discussion but there are several important points about practical implications)

I don't think you'll be able to form a confidence interval for the cdf in the Cramer-von Mises and Anderson Darline statistics, because the criteria are based on all of the deviations rather than just the largest.

Solved – R and SAS produce the same test-statistics but different p values for normality tests

The actual Kolmogorov-Smirnov, Anderson-Darling and Cramer-von Mises tests are for completely specified distributions. You're estimating the mean and variance of the residuals in your code so you don't have completely specified distributions, which will make your p-values larger than they should be.

There's another test based on estimating parameters and using a Kolmogorov-Smirnov type statistic -- properly called a Lilliefors test; it's no longer distribution free and you need a different distribution for the test statistic depending on which distribution you start with and which parameters you esitmate. Lilliefors did the normal and exponential cases. The normal with both parameters estimated case can be done in R using lillie.test in the nortest package.

For the other two tests the same comments apply (though approximate adjustments are a little simpler); the versions you're using in goftest are again for completely specified distributions.

In the same package I mentioned earlier (nortest) there are versions of the Cramer-von Mises and Anderson Darling tests for the case of testing normality. If you check the help on those functions, they specify that they're for the composite hypothesis of normality, which is what you seek here.

That won't necessarily make the p-values identical across SAS and R (they may not use the same approximations, for example) but if you use the corresponding tests they should be much closer.

There's an additional issue in your case -- it appears you're testing residuals (perhaps in an AR, but it doesn't matter for the present point). Even the versions in nortest don't account for the dependence between residuals. They're for independent, identically distributed values from a normal distribution with unspecified mean and variance. If you had normal errors you don't have independence of residuals and you don't usually have exactly identical distributions.

So even if you account for the estimation issue, the tests still won't be exactly right. I don't know what SAS is doing, but my guess is it's probably not accounting for this non-i.i.d. issue either.

As a general rule, if you want to test normality I wouldn't use multiple tests, (pick one that best identifies the kinds of deviations from normality you most want to pick up) and indeed, I wouldn't use those tests (though the Anderson Darling is often a pretty decent choice) -- I'd use Shapiro Wilk or one of the related tests to it.

On the other hand if I am trying to assess the suitability of a normality assumption for some model, I wouldn't use a formal hypothesis test at all. The problem is not "are the errors really normal?" (outside of simulated data are they ever actually normal? I seriously doubt it), it's "how much difference does it make?". That's an effect-size question, not a hypothesis testing question.

Best Answer

Related Solutions

Goodness-of-Fit – Anderson-Darling Test vs Cramér-von Mises Criterion Explained

Solved – R and SAS produce the same test-statistics but different p values for normality tests

Related Question