Solved – ks.test and ks.boot – exact p-values and ties

kolmogorov-smirnov testrties

I am confused by the behaviour of ks.test (package stat) a) in the presence of ties and b) if one-sided while doing a two-sample test. Documentation: "Exact p-values are not available for the two-sample case if one-sided or in the presence of ties."

I ask if black (experiment) and red (control) follow the same distribution function without knowing the underlying distribution function.

In my hands exact p-values are computed if one-sided and in the presence of ties (according to the warning message). But two-sided the p-value is just < 2.2e-16 but not an "exactly" reported.

If interested you may download the data as .Rda (length of the vector ~ 9000):

https://www.dropbox.com/s/xl29jvpurkbwqpm/black.Rda?dl=0

https://www.dropbox.com/s/5biptm1xet36v3v/red.Rda?dl=0

Example:

ks.test (black, red)

    Two-sample Kolmogorov-Smirnov test

data:  black and red
D = 0.0731, p-value < 2.2e-16
alternative hypothesis: two-sided

ks.test (black, red)$p.value 

[1] 0
Warnmeldung: # means warning message
In ks.test(black, red) :
  im Falle von Bindungen sind die p-Werte approximativ # "Bindungen" means ties

ks.test (black, red, alternative="g")$p.value # not as expected

[1] 1.235537e-23
Warnmeldung:
In ks.test(black, red, alternative = "g") :
  im Falle von Bindungen sind die p-Werte approximativ

ks.test (black, red, alternative="l")$p.value

[1] 0.0005651143
Warnmeldung:
In ks.test(black, red, alternative = "l") :
  im Falle von Bindungen sind die p-Werte approximativ

I tried ks.boot (package "Matching") that claims to work for two.sample tests with ties and "provides correct coverage even when the distributions being compared are not entirely continuous." Same story. I get exact p-values for one-sided conditions only. For instanche:

ks.boot (black, red, alternative="l")

$ks.boot.pvalue
[1] 0.001

$ks

Two-sample Kolmogorov-Smirnov test

data:  Tr and Co
D^- = 0.0275, p-value = 0.0005651
alternative hypothesis: the CDF of x lies below that of y

$nboots
[1] 1000

attr(,"class")
[1] "ks.boot"

Did I missunderstand the sentence "exact p-values are not available for the two-sample case if one-sided or in the presence of ties?" I thought the sense was: No exact p-value if one-sided or …

Are the p-values of ks.test (two.sample, one sided) "correct"?

In terms of delivering exact p-values ks.boot was not superior.

Can anybody please comment on this?
Thanks
Hermann

@Roland My problem: "Exact p-values are not available for the two-sample case if one-sided or in the presence of ties" (ks.test). Maybe I was confused by the term "exact" that is defined by (statistic) methods. But I get "precise" (in the sense of a precise number) p-values for the one-sided but not for the two-sided test …

  ks.test (black, red, alternative="g")$p.value # one-sided
  [1] 1.235537e-23 # precise p-value
  Warnmeldung:
   In ks.test(black, red, alternative = "g") :
  im Falle von Bindungen sind die p-Werte approximativ

  ks.test(black, red)$p.value # two.sided
  [1] 0 # Is this precise?

The most "precise" p-value (ks.test, two.sided) …

 ks.test (black, red)

 Two-sample Kolmogorov-Smirnov test

 data:  black and red
  D = 0.0731, p-value < 2.2e-16
  alternative hypothesis: two-sided

  Warnmeldung:
  In ks.test(black, red) :
  im Falle von Bindungen sind die p-Werte approximativ

I was confused a p-value of 0 is reported if there is a p.value < 2.2e-16 (two.sided, ks.test). Most likely this does not have anything to do with "exact" p.values.
So the answer might be: These are approximative p.values (according to the documentation because there are ties and it is one-sided). But this does not explain the different behaviour according to the reported p-values. I get a "precise" (approximative) p.value for one-sided but not for two-sided … Due to statistical reasons?

Further, I dont get an "precise" p-value for the two.sided ks.boot neither (that should be "exact"). It is < 2.2e-16 and ks.boot.pvalue is again 0. So where is the "exact" ks.boot.pvalue for the two.sided test? There is only the p.value of the ks.test.

ks.boot (black, red)
$ks.boot.pvalue
[1] 0 # no ks.boot.pvalue reported

    $ks

Two-sample Kolmogorov-Smirnov test

     data:  Tr and Co
    D = 0.0731, p-value < 2.2e-16
    alternative hypothesis: two-sided

       $nboots
    [1] 1000

    attr(,"class")
    [1] "ks.boot"

Are "precise" p-values (ks.boot) only reported for one.sided conditions?

Thanks Hermann

Best Answer

There are two points that are confused. The first one is about words "exact" and "approximate" in a statistical context. The word "exact" means that while calculations are carried out, no simplifications are used. The "approximate" p-value does not mean that the value is rounded to some precision. It means that while calculating it, some simplifications have been used. However, both "exact" and "approximate" calculations give precise numerical values. It is only our confidence that may differ. And now the second point: it is just the way of formatting output that gives you non-precise values. Actually, you are invoking the same output in different ways.

ks.test (black, red, alternative="l")$p.value
ks.test (black, red, alternative="g")$p.value
ks.test (black, red)$p.value

all give you precise (not rounded) values because you are calling the value of variables. In the last case the p-value is so small that it is lower than machine precision, and thus is listed as 0. But, when you are just calling a function, the function gives you human-readable output. During preparing this output, the p-values are passing through the format.pval() function. First of all, check the consistency of ks.test (black, red) and ks.test (black, red, alternative="g") - the p-values are the same in the non-precise format. And now compare

ks.test (black, red, alternative="g")$p.value and

format.pval(ks.test (black, red, alternative="g")$p.value)

Now is it clear how that p-value < 2.2e-16 is produced?

And finally about ks.boot(). It uses bootstrapping. While ks.test() obtains the probability of test statistics from the Kolmogorov distribution (this distribution describes how test statistics are distributed when two samples really are drawn from the same distribution), ks.boot() obtains the probability of test statistics from an empirical distribution, derived under the null hypothesis. That is, the studied two samples are combined together and from this united set two new samples are drawn at random with replacement. These new samples for sure are drawn from the same distribution, and their test statistics is noted. Repeating such procedure many times, we obtain the empirical distribution of test statistics under the null hypothesis. The number of repeats you are doing is in nboots variable in ks.boot() output. You have used default value of 1000. In this way, you have simulated 1000 test statistics values under the null hypothesis. You actual test statistics is greater than all these 1000. That means that p-value at least is equal or lesser than 0.001 - that is ks.boot.p.value. Call ks.boot(red,black,nboots=10000) and you'll obtain ks.boot.p.value=0.0001. To obtain a reasonable p-value with ks.boot() your nboots should have larger (absolute) order than expected p-value do have (i.e. more than $10^{23}$). I recommend you not do this, since it'll hang up your computer or will throw memory exception. Actually, the precise p-values of such small order have no any practical usage. Indeed, they are very sensitive to small changes in data, and thus repeated experiments would result in largely different p-values, so it can be said that the less p-value is - the less confidence to it precise value should be given.

Related Solutions

Solved – Kolmogorov-Smirnov two-sample $p$-values

Under the null hypothesis, the asymptotic distribution of the two-sample Kolmogorov–Smirnov statistic is the Kolmogorov distribution, which has CDF

$$\operatorname{Pr}(K\leq x)=\frac{\sqrt{2\pi}}{x}\sum_{i=1}^\infty e^{-(2i-1)^2\pi^2/(8x^2)} \>.$$

The $p$-values can be calculated from this CDF - see Section 4 and Section 2 of the Wikipedia page on the Kolmogorov–Smirnov test.

You seem to be saying that a non-parametric test statistic shouldn't have a distribution - that's not the case - what makes this test non-parametric is that the distribution of the test statistic does not depend on what continuous probability distribution the original data come from. Note that the KS test has this property even for finite samples as shown by @cardinal in the comments.

Goodness-of-Fit – Equivalent of Kolmogorov-Smirnov Test for Integer Data

The Permutation test could be applied here as well. The idea is as follows.

Let $X_1,...,X_m\sim F$ and $Y_1,...,Y_n\sim G$ be two independent samples and consider testing the hypothesis $H_0:F=G$ vs. $H_1:F\neq G$. For this purpose, label your data as follows

\begin{array}{c c} 1 & X_1\\ 1 & X_2\\ \vdots & \vdots\\ 1 & X_m\\ 2 & Y_1\\ 2 & Y_2\\ \vdots & \vdots\\ 2 & Y_n\\ \end{array}

Now, let $T$ be an statistic of the sample $S=\{X_1,...,X_m,Y_1,...,Y_n\}$ and the labels $L=\{1,1,...,2,2,...,2\}$.

If $H_0$ is true, then the labeling is superfluous.

Now, permute the group labels and recalculate the test statistic a large number of times, say $B$.

The one-sided p-value of this test is calculated as the proportion of sampled permutations where the difference in means was greater than or equal to $T(S,L)$. The two-sided p-value of the test is calculated as the proportion of sampled permutations where the absolute difference was greater than or equal to $\mbox{abs}(T(S,L))$. See

A toy example

Let $X_i \sim \text{Poisson}(10)$, $i=1,...,m=100$, and $Y_j \sim \text{Poisson}(11)$, $j=1,...,n=100$. Consider the statistic $T=\text{mean of Group 1} - \text{mean of Group 2}$. The permutation method using this statistic is implemented below.

rm(list=ls)
set.seed(1)
# Sample size
ns=100
#Simulated data
x = rpois(ns,11)
y = rpois(ns,10)

# Observed statistic    
T.obs = mean(x) - mean(y)

# Pooled data
SL = rbind(cbind(rep(1,ns),x),cbind(rep(2,ns),y))

# Resampling
B=10000
T = rep(0,B)

for(i in 1:B){
samp = sample(SL[,1])
ind1 = which(samp==1)
ind2 = which(samp==2)
T[i] = mean( SL[ind1,2] )- mean( SL[ind2,2] )
}

# p-value
p.value = length(which(abs(T)>abs(T.obs)))/B

I do not know how robust is this method, but after some experiments it seems to perform moderately well. Note that the choice of the statitic $T$ is open and therefore one must be careful on making a meaningful choice in the context of your problem as the performance depends on both the statistic and the sample size.

I hope this helps.

Best Answer

Related Solutions

Solved – Kolmogorov-Smirnov two-sample $p$-values

Goodness-of-Fit – Equivalent of Kolmogorov-Smirnov Test for Integer Data

Related Question