Kolmogorov Smirnov Test vs Anderson Darling Test – How to Choose for Goodness of Fit

anderson darling testfat-tailsgoodness of fitkolmogorov-smirnov test

I learned that Kolmogorov Smirnov loses sensitivity (power) in the tails, thus it is not adequate for testing goodness-of-fit of fat tailed distributions. However, Anderson Darling test is more sensitive at the tail, thus is better than KS test for fat tailed distributions.

But are there any explanations (intuitively or theoretically) for this statement? Are there any good references about this?

Best Answer

The Kolmogorov-Smirnov looks for the largest difference between the cdf and the empirical cdf of the data.

There's another test -- the Cramér-von Mises test -- which looks at the sum of squares of the differences in cdf (at the data). It's often somewhat more sensitive than the Kolmogorov-Smirnov to the kind of differences we tend to want to pick up (because it can "accumulate" small but consistent differences rather than needing a single large one).

The problem with both of those tests is that the tail of the cdf is more precise than the middle (in the same sense that a sample estimate of a population proportion near 0 or 1 has lower variance than one near 0.5).

The algebra to show this is not particularly difficult, but we can observe this quite easily without doing the mathematics; we don't need to perform it to obtain intuition about what's going on; there's something even simpler that we can do.

Here I simulate 100 sets of data drawn from a standard uniform, each data set has a sample size of 25. I then draw the empirical cdf of each one (the first such data set is shown in blue; all the rest are in grey and for those I don't plot the point at the left of each step, just the step itself):

ECDF plots of 100 samples of size 25 from a standard uniform

As we see in the plot, the (vertical) spread is widest when the population cdf is close to 0.5 and narrowest when the population cdf is close to 0 or 1; the population cdf ($F$) is a diagonal line from (0,0) to (1,1). This pattern of changing spread in the sampling distribution of the empirical cdf happens for every distribution; the spread (specifically, the standard deviation of $\hat{F}$) relates only to $n$ and $F(1-F)$.

We can make use of this additional information that simple tests like the Kolmogorov-Smirnov and the Cramér-von Mises test ignore.

If you calculate a weighted version of the Cramér-von Mises (with weights inversely proportional to this variance) then you end up with the Anderson-Darling statistic; which is to say, it correctly (optimally in a particular sense) accounts for the fact that the cdf in the tail is more precisely estimated; this makes it more sensitive to the differences in the tail than the first two statistics, which don't use the fact that we can estimate the cdf of the tail more precisely.