R – Ties in a Two-Sample Kolmogorov-Smirnov Test

kolmogorov-smirnov testrties

This question is somehow connected to this one.
I am performing a two-sample KS test in R and I think I have not fully understood the issue of the ties.

Reading the help: The presence of ties always generates a warning, since continuous distributions do not generate them. If the ties arose from rounding the tests may be approximately valid, but even modest amounts of rounding can have a significant effect on the calculated statistic.
So, I understand this in the case of a single sample, but why do I get the same warning if the tie is represented by the same value present in the two vectors?

Example:

no ties case

set.seed(123)
x <- rnorm(50)
y <- runif(30)
ks.test(x, y)

Two-sample Kolmogorov-Smirnov test
data:  x and y
D = 0.52, p-value = 3.885e-05
alternative hypothesis: two-sided

case with ties

x <- c(0,1,1, rnorm(47)) # this vector has the value 1 repeated twice
y <- c(1,runif(29))
ks.test(x, y)

Two-sample Kolmogorov-Smirnov test
data:  x and y
D = 0.5, p-value = 0.0001696
alternative hypothesis: two-sided
Warning message:
In ks.test(x, y) : cannot compute exact p-value with ties

case I thought it shouldn't be tied, but in fact it is:

x <- c(0,1,1, rnorm(47))
y <- c(1,runif(29))
ks.test(unique(x), unique(y))

Two-sample Kolmogorov-Smirnov test
data:  unique(x) and unique(y)
D = 0.59184, p-value = 4.363e-06
alternative hypothesis: two-sided
Warning message:
In ks.test(unique(x), unique(y)) : cannot compute exact p-value with ties

Best Answer

The reason in the one-sample case is exactly the same reason in the two-sample case: in general, $Pr(X = c) = 0$ for some continuously distributed $X$ and some single value $c$. Ties (single sample or two sample) imply that $Pr(X = c) \ne 0$.

Related Solutions

Kolmogorov-Smirnov Test – How to Perform a Kolmogorov-Smirnov Two-Sample Test

I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x and y are your two samples, ks.test(x,y) returns the test statistic and pvalue. For example,

> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)    
        Two-sample Kolmogorov-Smirnov test    
data:  x and y 
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided

By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000 in the two-sample case), or you can specify this option with a third argument, exact=F or exact=T. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.

Kolmogorov-Smirnov Test – Alternatives for Tied Data with Correction

Instead of using the KS test you could simply use a permutation or resampling procedure as implemented in the oneway_test function of the coin package. Have a look at the accepted answer to this question.

Update: My package afex contains the function compare.2.vectors implementing a permutation and other tests for two vectors. You can get it from CRAN:

install.packages("afex")

For two vectors x and y it (currently) returns something like:

> compare.2.vectors(x,y)
$parametric
   test test.statistic test.value test.df       p
1     t              t     -1.861   18.00 0.07919
2 Welch              t     -1.861   17.78 0.07939

$nonparametric
             test test.statistic test.value test.df       p
1 stats::Wilcoxon              W     25.500      NA 0.06933
2     permutation              Z     -1.751      NA 0.08154
3  coin::Wilcoxon              Z     -1.854      NA 0.06487
4          median              Z      1.744      NA 0.17867

Any comments regarding this function are highly welcomed.

Best Answer

Related Solutions

Kolmogorov-Smirnov Test – How to Perform a Kolmogorov-Smirnov Two-Sample Test

Kolmogorov-Smirnov Test – Alternatives for Tied Data with Correction

Related Question