Solved – KS test Enquiry in R

hypothesis testingr

I have a data vector $p$ and I want to test if its true distribution is uniform[0,1]. So I ran a 2 similar KS tests, one with $y = p$ and the other with $y = ecdf(p)$.

ks.test(runif(length(p), min = 0 , max= 1), p)

Please forgive me for not being able to show you guys the data vector p, because its length is 100000. Nonetheless, all entries in vector $p$ takes values between 0 and 1, i.e. [0,1] (inclusive).

The result of the above test is:

    Two-sample Kolmogorov-Smirnov test

data:  runif(length(p), min = 0, max = 1) and p
D = 0.0082, p-value = 0.2463
alternative hypothesis: two-sided

but it comes with a warning message at the end:

Warning message:
In ks.test(runif(length(p), min = 0, max = 1), p) :
  cannot compute correct p-values with ties

The 2nd KS test that was ran at the same time (which I thought could avoid getting warning messages):

print(ks.test(runif(length(p), min = 0 , max= 1), ecdf(p)))


    One-sample Kolmogorov-Smirnov test

data:  runif(length(p) + 1000, min = 0, max = 1)
D = 0.0093, p-value = 0.008757
alternative hypothesis: two-sided

I have looked at other posts, but I still don't understand the warning messages. Question 1: What are the "ties" referring to ?

Now, I checked out R's help and it said $y$ can be either a data vector or cdf of data vector. In the latter case, it leads to a one sample KS test, which I have no idea how it differs from 2 samples KS test.
Question 2: What is the difference between one-sample VS two-sample KS test ?

I thought the 2 KS tests I ran above are very similar (if not identical), since I just want to if the data vector follows a continuous uniform distribution. And I thought the 2nd KS test (using $ecdf(y)$) would avoid getting warning messages. If the 2 KS tests are actually doing different things, please tell me and advise which one I should use ?

Question 3: The 2 KS tests yield different p values, the first one > 0.05, while the second < 0.05. But if I plot a histogram for my data vector, it actually looks uniform to me, which seems to suggest that there is something subtle about the 2nd KS test that I dont know about (any ideas ?):

Question 4: Please advise me how to handle the warning message from the 1st KS test ? Can I just ignored them ?

Note: I wrote a function that ran those 2 KS tests above and plot the histogram. All are done at once.

enter image description here

Best Answer

  1. Ties are numbers that are repeating in sample. You may try anyDuplicated(p) to find whether there are ties. There is uncertainty with quantiles when ties are present. However, if their number is not high, there is no reason to worry about them.
  2. In two-sample KS test you are testing whether two empirical distributions come from the same theoretical distribution. In one-sample KS test you are testing whether given empirical distribution come from given theoretical. Here is an example. Let we have one empirical distribution positively skewed, and second - the same but negatively skewed. Two sample KS test would search whether both samples can be drawn from the third- theoretical distribution, which has zero skewness in this example. And p-value can be >0.05. While performing one-sample test will search whether first distribution can be drawn from the second, which unlikely can be. And thus p falls below 0.05.
  3. Because you tested not whether the sample can be drawn from uniform distribution, but from distribution p that is only similar to uniform.
  4. If length(duplicated(p)) is small being compared with length(p) than you should not worry about it.

And finally: if you want to test whether some empirical distribution comes from uniform, you should use one-sample KS test in this notation: ks.test(p,"punif")