Solved – ks.test and ks.boot – exact p-values and ties

kolmogorov-smirnov testrties

I am confused by the behaviour of ks.test (package stat) a) in the presence of ties and b) if one-sided while doing a two-sample test. Documentation: "Exact p-values are not available for the two-sample case if one-sided or in the presence of ties."

I ask if black (experiment) and red (control) follow the same distribution function without knowing the underlying distribution function.

In my hands exact p-values are computed if one-sided and in the presence of ties (according to the warning message). But two-sided the p-value is just < 2.2e-16 but not an "exactly" reported.

If interested you may download the data as .Rda (length of the vector ~ 9000):

https://www.dropbox.com/s/xl29jvpurkbwqpm/black.Rda?dl=0

https://www.dropbox.com/s/5biptm1xet36v3v/red.Rda?dl=0

Example:

ks.test (black, red)

    Two-sample Kolmogorov-Smirnov test

data:  black and red
D = 0.0731, p-value < 2.2e-16
alternative hypothesis: two-sided

ks.test (black, red)$p.value 

[1] 0
Warnmeldung: # means warning message
In ks.test(black, red) :
  im Falle von Bindungen sind die p-Werte approximativ # "Bindungen" means ties

ks.test (black, red, alternative="g")$p.value # not as expected

[1] 1.235537e-23
Warnmeldung:
In ks.test(black, red, alternative = "g") :
  im Falle von Bindungen sind die p-Werte approximativ

ks.test (black, red, alternative="l")$p.value

[1] 0.0005651143
Warnmeldung:
In ks.test(black, red, alternative = "l") :
  im Falle von Bindungen sind die p-Werte approximativ

I tried ks.boot (package "Matching") that claims to work for two.sample tests with ties and "provides correct coverage even when the distributions being compared are not entirely continuous." Same story. I get exact p-values for one-sided conditions only. For instanche:

ks.boot (black, red, alternative="l")

$ks.boot.pvalue
[1] 0.001

$ks

Two-sample Kolmogorov-Smirnov test

data:  Tr and Co
D^- = 0.0275, p-value = 0.0005651
alternative hypothesis: the CDF of x lies below that of y

$nboots
[1] 1000

attr(,"class")
[1] "ks.boot"

Did I missunderstand the sentence "exact p-values are not available for the two-sample case if one-sided or in the presence of ties?" I thought the sense was: No exact p-value if one-sided or …

Are the p-values of ks.test (two.sample, one sided) "correct"?

In terms of delivering exact p-values ks.boot was not superior.

Can anybody please comment on this?
Thanks
Hermann


@Roland My problem: "Exact p-values are not available for the two-sample case if one-sided or in the presence of ties" (ks.test). Maybe I was confused by the term "exact" that is defined by (statistic) methods. But I get "precise" (in the sense of a precise number) p-values for the one-sided but not for the two-sided test …

  ks.test (black, red, alternative="g")$p.value # one-sided
  [1] 1.235537e-23 # precise p-value
  Warnmeldung:
   In ks.test(black, red, alternative = "g") :
  im Falle von Bindungen sind die p-Werte approximativ

  ks.test(black, red)$p.value # two.sided
  [1] 0 # Is this precise?

The most "precise" p-value (ks.test, two.sided) …

 ks.test (black, red)

 Two-sample Kolmogorov-Smirnov test

 data:  black and red
  D = 0.0731, p-value < 2.2e-16
  alternative hypothesis: two-sided

  Warnmeldung:
  In ks.test(black, red) :
  im Falle von Bindungen sind die p-Werte approximativ

I was confused a p-value of 0 is reported if there is a p.value < 2.2e-16 (two.sided, ks.test). Most likely this does not have anything to do with "exact" p.values.
So the answer might be: These are approximative p.values (according to the documentation because there are ties and it is one-sided). But this does not explain the different behaviour according to the reported p-values. I get a "precise" (approximative) p.value for one-sided but not for two-sided … Due to statistical reasons?

Further, I dont get an "precise" p-value for the two.sided ks.boot neither (that should be "exact"). It is < 2.2e-16 and ks.boot.pvalue is again 0. So where is the "exact" ks.boot.pvalue for the two.sided test? There is only the p.value of the ks.test.

ks.boot (black, red)
$ks.boot.pvalue
[1] 0 # no ks.boot.pvalue reported

    $ks

Two-sample Kolmogorov-Smirnov test

     data:  Tr and Co
    D = 0.0731, p-value < 2.2e-16
    alternative hypothesis: two-sided

       $nboots
    [1] 1000

    attr(,"class")
    [1] "ks.boot"

Are "precise" p-values (ks.boot) only reported for one.sided conditions?

Thanks Hermann

Best Answer

There are two points that are confused. The first one is about words "exact" and "approximate" in a statistical context. The word "exact" means that while calculations are carried out, no simplifications are used. The "approximate" p-value does not mean that the value is rounded to some precision. It means that while calculating it, some simplifications have been used. However, both "exact" and "approximate" calculations give precise numerical values. It is only our confidence that may differ. And now the second point: it is just the way of formatting output that gives you non-precise values. Actually, you are invoking the same output in different ways.

ks.test (black, red, alternative="l")$p.value
ks.test (black, red, alternative="g")$p.value
ks.test (black, red)$p.value

all give you precise (not rounded) values because you are calling the value of variables. In the last case the p-value is so small that it is lower than machine precision, and thus is listed as 0. But, when you are just calling a function, the function gives you human-readable output. During preparing this output, the p-values are passing through the format.pval() function. First of all, check the consistency of ks.test (black, red) and ks.test (black, red, alternative="g") - the p-values are the same in the non-precise format. And now compare

ks.test (black, red, alternative="g")$p.value and

format.pval(ks.test (black, red, alternative="g")$p.value)

Now is it clear how that p-value < 2.2e-16 is produced?

And finally about ks.boot(). It uses bootstrapping. While ks.test() obtains the probability of test statistics from the Kolmogorov distribution (this distribution describes how test statistics are distributed when two samples really are drawn from the same distribution), ks.boot() obtains the probability of test statistics from an empirical distribution, derived under the null hypothesis. That is, the studied two samples are combined together and from this united set two new samples are drawn at random with replacement. These new samples for sure are drawn from the same distribution, and their test statistics is noted. Repeating such procedure many times, we obtain the empirical distribution of test statistics under the null hypothesis. The number of repeats you are doing is in nboots variable in ks.boot() output. You have used default value of 1000. In this way, you have simulated 1000 test statistics values under the null hypothesis. You actual test statistics is greater than all these 1000. That means that p-value at least is equal or lesser than 0.001 - that is ks.boot.p.value. Call ks.boot(red,black,nboots=10000) and you'll obtain ks.boot.p.value=0.0001. To obtain a reasonable p-value with ks.boot() your nboots should have larger (absolute) order than expected p-value do have (i.e. more than $10^{23}$). I recommend you not do this, since it'll hang up your computer or will throw memory exception. Actually, the precise p-values of such small order have no any practical usage. Indeed, they are very sensitive to small changes in data, and thus repeated experiments would result in largely different p-values, so it can be said that the less p-value is - the less confidence to it precise value should be given.