Kolmogorov-Smirnov Test – How to Understand ks.boot Test Results

kolmogorov-smirnov test

I am trying to perform the ks.boot analysis to compare two numerical dstributions for 2 different phenomena. Those 2 numerical lists are at http://pastebin.com/s6ATRLDC and http://pastebin.com/hEc7Mqhp

When I perform the ks.boot analyses, I get the following info on-screen:

> in1=read.table("Aco195_INTERGENE-BEST-sister.UNalnPseudoCDS_Len", header=TRUE)
> in2=read.table("Aco195_PSEUDOGENE-BEST-sister.UNalnPseudoCDS_Len", header=TRUE)
> NDM=in1[,1]

> DM=in2[,1]

> summary(NDM)   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   33.0   132.0   189.0   265.1   273.0  4332.0 
> summary(DM)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     33     159     243     358     408    4353 

> ks1<-ks.boot(NDM,DM,alternative=c("g"))
> summary(ks1)
Bootstrap p-value:     < 2.22e-16 
Naive p-value:         1.4532e-270 
Full Sample Statistic: 0.20158 

> ks2<-ks.boot(NDM,DM,alternative=c("l"))
> summary(ks2)
Bootstrap p-value:     < 2.22e-16 
Naive p-value:         2.2676e-05 
Full Sample Statistic: 0.026446 

> ks.boot(NDM,DM,alternative=c("l"))
$ks.boot.pvalue
[1] 0

>$ks

    Two-sample Kolmogorov-Smirnov test

>data:  Tr and Co
D^- = 0.0264, p-value = 2.268e-05
alternative hypothesis: the CDF of x lies below that of y


>$nboots
[1] 1000

>attr(,"class")
[1] "ks.boot"


> ks.boot(NDM,DM,alternative=c("g"))
>$ks.boot.pvalue
[1] 0

>$ks

    Two-sample Kolmogorov-Smirnov test

>data:  Tr and Co
D^+ = 0.2016, p-value < 2.2e-16
alternative hypothesis: the CDF of x lies above that of y

>$nboots
[1] 1000

>attr(,"class")
[1] "ks.boot"

I am confused because it seems both p-values are low enough to allow accepting the alternative hypothesis and reject the NULL. However, that is absurd because one numerical distribution cannnot be both higher and lower than the other! So obviously I am interpreting this wrong. I dont have much stats background, so hopefully someone can spell it out.

I have not had much luck trying to see how content posted at Goodness of fit test for a mixture in R or ks.test and ks.boot – exact p-values and ties can help clarify my confusion. In my mind, its as though both alternative hypothesis ("g" and "l") are acceptable, and that one is relatively even less likely to occur by chance than the other, so calculating the ratio of these 2 p-values would give something that may be meaningful? Or rather I should choose the alternative hypothesis that yields the lowest p-value? Yes? Or may be I am totally off. Help, please!

Best Answer

This is because of the absolute value that is used as test statistic:

$$D_n= \sup_x |F_n(x)-F(x)|$$

Source

To get a "correct" result you could use a two sided test, which will simply tell you whether your distributions are different or not.

Here is some discussion about it.

They also provide another useful link about this topic