This question is somehow connected to this one.
I am performing a two-sample KS test in R and I think I have not fully understood the issue of the ties.
Reading the help: The presence of ties always generates a warning, since continuous distributions do not generate them. If the ties arose from rounding the tests may be approximately valid, but even modest amounts of rounding can have a significant effect on the calculated statistic.
So, I understand this in the case of a single sample, but why do I get the same warning if the tie is represented by the same value present in the two vectors?
Example:
no ties case
set.seed(123)
x <- rnorm(50)
y <- runif(30)
ks.test(x, y)
Two-sample Kolmogorov-Smirnov test
data: x and y
D = 0.52, p-value = 3.885e-05
alternative hypothesis: two-sided
case with ties
x <- c(0,1,1, rnorm(47)) # this vector has the value 1 repeated twice
y <- c(1,runif(29))
ks.test(x, y)
Two-sample Kolmogorov-Smirnov test
data: x and y
D = 0.5, p-value = 0.0001696
alternative hypothesis: two-sided
Warning message:
In ks.test(x, y) : cannot compute exact p-value with ties
case I thought it shouldn't be tied, but in fact it is:
x <- c(0,1,1, rnorm(47))
y <- c(1,runif(29))
ks.test(unique(x), unique(y))
Two-sample Kolmogorov-Smirnov test
data: unique(x) and unique(y)
D = 0.59184, p-value = 4.363e-06
alternative hypothesis: two-sided
Warning message:
In ks.test(unique(x), unique(y)) : cannot compute exact p-value with ties
Best Answer
The reason in the one-sample case is exactly the same reason in the two-sample case: in general, $Pr(X = c) = 0$ for some continuously distributed $X$ and some single value $c$. Ties (single sample or two sample) imply that $Pr(X = c) \ne 0$.