Solved – KS test for Uniformity

hypothesis testingrstatistical significanceuniform distribution

I am attempting to use the KS-test to test whether a set of points is uniformly distributed over an interval, and I had a question about whether there may be a more optimal test for what I'm trying to do.

Let's say we have the interval [0,1] and have a set of points in this interval and want to test if they have been drawn from the uniform distribution over this interval.

Say we have two sets of points:

a = seq(0.49, 0.51, by = 0.001)

and

b = seq(0.09,0.11, by = 0.001)

These sequences should be equivalently unlikely to be drawn from a uniform distribution over the interval. However, in the ks-test, b will be far more significant because all of the weight is at the beginning of the interval rather than in the middle of the interval where the distance between the observed CDF and the uniform distribution is greatest (at the ends).

For example, using R:

> ks.test(a, "punif")
    One-sample Kolmogorov-Smirnov test
data:  a
D = 0.49, p-value = 3.576e-05
alternative hypothesis: two-sided
> ks.test(b, "punif")
    One-sample Kolmogorov-Smirnov test
data:  b
D = 0.89, p-value < 2.2e-16
alternative hypothesis: two-sided

The ks-test shows that b is more significantly different from the uniform distribution than a. Is there a test where these would be treated equally?

Best Answer

This is a nice question!

Allow me to use a little more data:

> a = seq(0.49, 0.51, length.out=10/0.01)
> b = seq(0.09, 0.11, length.out=10/0.01)

Here is Kolmogorov-Smirnov. The p values are equally infinitesimal, but we see how the test statistic for b is almost twice as large as for a.

> ks.test(a, "punif")

        One-sample Kolmogorov-Smirnov test

data:  a
D = 0.49, p-value < 2.2e-16
alternative hypothesis: two-sided

> ks.test(b, "punif")

        One-sample Kolmogorov-Smirnov test

data:  b
D = 0.89, p-value < 2.2e-16
alternative hypothesis: two-sided

Now, one alternative would of course be to bin the interval $[0,1]$ into a sufficiently large number of bins and then run a standard $\chi^2$ test over the contingency tables. (This is, of course, why I used more data, so the expected count for bins of width 0.01 would be 10.)

> table.a <- hist(a,breaks=seq(0,1,by=.01),plot=F)$counts
    > table.b <- hist(a,breaks=seq(0,1,by=.01),plot=F)$counts
> chisq.test(rbind(table.a,10))

        Pearson's Chi-squared test

data:  rbind(table.a, 10)
X-squared = 1917.9, df = 99, p-value < 2.2e-16

> chisq.test(rbind(table.b,10))

        Pearson's Chi-squared test

data:  rbind(table.b, 10)
X-squared = 1917.9, df = 99, p-value < 2.2e-16

And indeed, the test statistics are identical.

Related Solutions

Solved – Kolmogorv Smirnov Test in R

Try:

ks.test(dates, "punif", 1, 365)

To understand why, try typing

?ks.test

and reading the last bit of the 'Details' section.

Kolmogorov-Smirnov Test – Proper Use with dgof::ks.test in R for Discrete Data

This is an answer to @jbrucks extension (but answers the original as well).

One general test of whether 2 samples come from the same population/distribution or if there is a difference is the permutation test. Choose a statistic of interest, this could be the KS test statistic or the difference of means or the difference of medians or the ratio of variances or ... (whatever is most meaningful for your question, you could do simulations under likely conditions to see which statistic gives you the best results) and compute that stat on the original 2 samples. Then you randomly permute the observations between the groups (group all the data points into one big pool, then randomly split them into 2 groups the same sizes as the original samples) and compute the statistic of interest on the permuted samples. Repeat this a bunch of times, the distribution of the sample statistics forms your null distribution and you compare the original statistic to this distribution to form the test. Note that the null hypothesis is that the distributions are identical, not just that the means/median/etc. are equal.

If you don't want to assume that the distributions are identical but want to test for a difference in means/medians/etc. then you could do a bootstrap.

If you know what distribution the data comes from (or at least are willing to assume a distribution) then you can do a liklihood ratio test on the equality of the parameters (compare the model with a single set of parameters over both groups to the model with seperate sets of parameters). The liklihood ratio test usually uses a chi-squared distribution which is fine in many cases (asymtotics), but if you are using small sample sizes or testing a parameter near its boundary (a variance being 0 for example) then the approximation may not be good, you could again use the permutation test to get a better null distribution.

These tests all work on either continuous or discrete distributions. You should also include some measure of power or a confidence interval to indicate the amount of uncertainty, a lack of significance could be due to low power or a statistically significant difference could still be practically meaningless.

Best Answer

Related Solutions

Solved – Kolmogorv Smirnov Test in R

Kolmogorov-Smirnov Test – Proper Use with dgof::ks.test in R for Discrete Data

Related Question