I have a data vector $p$ and I want to test if its true distribution is uniform[0,1]. So I ran a 2 similar KS tests, one with $y = p$ and the other with $y = ecdf(p)$.
ks.test(runif(length(p), min = 0 , max= 1), p)
Please forgive me for not being able to show you guys the data vector p, because its length is 100000. Nonetheless, all entries in vector $p$ takes values between 0 and 1, i.e. [0,1] (inclusive).
The result of the above test is:
Two-sample Kolmogorov-Smirnov test
data: runif(length(p), min = 0, max = 1) and p
D = 0.0082, p-value = 0.2463
alternative hypothesis: two-sided
but it comes with a warning message at the end:
Warning message:
In ks.test(runif(length(p), min = 0, max = 1), p) :
cannot compute correct p-values with ties
The 2nd KS test that was ran at the same time (which I thought could avoid getting warning messages):
print(ks.test(runif(length(p), min = 0 , max= 1), ecdf(p)))
Results:
One-sample Kolmogorov-Smirnov test
data: runif(length(p) + 1000, min = 0, max = 1)
D = 0.0093, p-value = 0.008757
alternative hypothesis: two-sided
I have looked at other posts, but I still don't understand the warning messages. Question 1: What are the "ties" referring to ?
Now, I checked out R's help and it said $y$ can be either a data vector or cdf of data vector. In the latter case, it leads to a one sample KS test, which I have no idea how it differs from 2 samples KS test.
Question 2: What is the difference between one-sample VS two-sample KS test ?
I thought the 2 KS tests I ran above are very similar (if not identical), since I just want to if the data vector follows a continuous uniform distribution. And I thought the 2nd KS test (using $ecdf(y)$) would avoid getting warning messages. If the 2 KS tests are actually doing different things, please tell me and advise which one I should use ?
Question 3: The 2 KS tests yield different p values, the first one > 0.05, while the second < 0.05. But if I plot a histogram for my data vector, it actually looks uniform to me, which seems to suggest that there is something subtle about the 2nd KS test that I dont know about (any ideas ?):
Question 4: Please advise me how to handle the warning message from the 1st KS test ? Can I just ignored them ?
Note: I wrote a function that ran those 2 KS tests above and plot the histogram. All are done at once.
Best Answer
anyDuplicated(p)
to find whether there are ties. There is uncertainty with quantiles when ties are present. However, if their number is not high, there is no reason to worry about them.length(duplicated(p))
is small being compared withlength(p)
than you should not worry about it.And finally: if you want to test whether some empirical distribution comes from uniform, you should use one-sample KS test in this notation:
ks.test(p,"punif")