Solved – Understanding Kolmogorov-Smirnov test in R

kolmogorov-smirnov testrties

I'm trying to understand the output of the Kolmogorov-Smirnov test function (two samples, two sided).
Here is a simple test.

x <- c(1,2,2,3,3,3,3,4,5,6)
y <- c(2,3,4,5,5,6,6,6,6,7)
z <- c(12,13,14,15,15,16,16,16,16,17)

ks.test(x,y)

#   Two-sample Kolmogorov-Smirnov test
#
#data:  x and y
#D = 0.5, p-value = 0.1641
#alternative hypothesis: two-sided
#
#Warning message:
#In ks.test(x, y) : cannot compute exact p-value with ties

ks.test(x,z)

#Two-sample Kolmogorov-Smirnov test

#data:  x and z
#D = 1, p-value = 9.08e-05
#alternative hypothesis: two-sided
#
#Warning message:
#In ks.test(x, z) : cannot compute exact p-value with ties


ks.test(x,x)

#Two-sample Kolmogorov-Smirnov test

#data:  x and x
#D = 0, p-value = 1
#alternative hypothesis: two-sided
#
#Warning message:
#In ks.test(x, x) : cannot compute exact p-value with ties

There are a few things I don't understand here.

  1. From the help, it seems that the p-value refers to the hypothesis var1=var2. However, here that would mean that the test says (p<0.05):

    a. Cannot say that X = Y;

    b. Can say that X = Z;

    c. Cannot say that X = X (!)

Besides appearing that x is different from itself (!), it is also quite strange to me that x=z, as the two distributions have zero overlapping support. How is that possible?

  1. According to the definition of the test, D should be the maximum difference between the two probability distributions, but for instance in the case (x,y) it should be D = Max|P(x)-P(y)| = 4 (in the case when P(x), P(y) aren't normalized) or D=0.3 (if they are normalized). Why D is different from that?

  2. I have intentionally made an example with many ties, as the data I'm working with have lots of identical values. Why does this confuse the test? I thought it calculated a probability distribution that should not be affected by repeated values. Any idea?

Best Answer

The KS test is premised on testing the "sameness" of two independent samples from a continuous distribution (as the help page states). If that is the case then the probability of ties should be astonishingly small (also stated). The test statistic is the maximum distance between the ECDF's of the two samples. The p-value is the probability of seeing a test statistic as high or higher than the one observed if the two samples were drawn from the same distribution. (It is not the "probability that var1 = var2". And furthermore, 1-p_value is NOT the that probability either.) High p-values say you cannot claim statistical support for a difference, but low p-values are not evidence of sameness. Low p-values can occur with low sample sizes (as your example provides) or the presence of interesting but small differences, e.g. superimposed oscillatory disturbances. If you are working with situations with large numbers of ties it suggests you may need to use a test that more closely fits your data situation.

My explanation of why ties were a violation of assumptions was not a claim that ties invalidated the results. The statistical properties of the KS test in practice are relatively resistant or robust to failure of that assumption. The main problem with the KS test as I see is that it is excessively general and as a consequence is under-powered to identify meaningful differences of an interesting nature. The KS test is a very general test and has rather low power for more specific hypotheses.

On the other hand, I also see the KS-test (or the "even more powerful" Anderson Darling or Lillefors(sp?) test) used to test "normality" in situations where such a test is completely unwarranted, such as test for the normality of variables being used as predictors in a regression model before the fit. One might legitimately want to be testing the normality of the residuals since that is what is assumed in the modeling theory. Even then modest departures from normality of the residuals do not generally challenge the validity of the results. Persons would be better of using robust methods to check for important impact of "non-normality" on conclusions about statistical significance.

Perhaps you should consult with a local statistician? It might assist you in defining the statistical question a bit more precisely and therefore have a better chance of identifying a difference if one actually exists. That would be avoidance of a "type II error": failing to support a conclusion of difference when such a difference is present.