Solved – Kolmogorov-Smirnov test applying in R

kolmogorov-smirnov testr

I tried to use the Kolmogorov-Smirnov test to test whether a sample is exponentially distributed. With the try and error method I tried a couple of rates. This is a small simple example of what I do:

    ks.test(Interarrivaltimes,pexp,0.00029) 

Here is the result R gives me:

One-sample Kolmogorov-Smirnov test

data:  Interarrivaltimes
D = 0.023961, p-value < 2.2e-16
alternative hypothesis: two-sided

The p-value is very low whereas the test should accept the null-hypothesis.

I do not understand why it does not work.

Best Answer

It's hard to give a specific answer without the details requested earlier, but I think I can point you in the right general direction.

First, let's consider a sample of n = 15 from an exponential distribution with a rate of 0.00029. When we run the ks.test, we fail to reject the null hypothesis, as expected.

set.seed(pi)

x <- rexp(15, 0.00029)
ks.test(x, pexp, 0.00029)

Now let's consider a case where n = 1,000, and the rate is still 0.00029. In this particular instance, we get a p-value of 0.9784. Again, we fail to reject the null hypothesis.

x2 <- rexp(1000, 0.00029)
ks.test(x2, pexp, 0.00029)

Now let's look at something we're more likely to see in practice. When we take a sample, we usually have to estimate the parameters of distributions. So if your inter-arrival times come from a sample and you've estimated that the rate is 0.00029, that is only an estimate and doesn't tell us what the true population rate is. Why is this important?

At a small sample size, you probably won't detect much of a difference between your estimated distribution and your population distribution. Let's assume that the population rate is actually 0.00030, but you've gotten a very, very close estimate of 0.00029. A difference of one hundred thousandth doesn't seem like much, does it? In a sample size of 15, we still fail to reject the null hypothesis (p = 0.8255).

y <- rexp(15, 0.00030)
ks.test(y, pexp, 0.00029)

Now let's take a large sample of n = 1,000. In this example, even with such a small difference between the population rate and the estimate rate, we get a p-value of 0.07506, which is very close to that common 0.05 threshold of significance.

y2 <- rexp(1000, 0.00030)
ks.test(y2, pexp, 0.00029)

In yet another sample of 1000, we can get a p-value of 0.008196, which rejects the null hypothesis at most significance levels.

The moral of the story is that very small differences from the population parameter can be detected as "significantly different" given a large enough sample size.

So failing to reject the null hypothesis in a large sample doesn't necessarily mean that your sample parameter or distribution is poorly fit. It only means that the KS test thinks they are significantly different. And as some of us are fond of saying, statistical significance is not the same thing as practical significance.