Solved – Even more with the Kolmogorov-Smirnov test with R software

kolmogorov-smirnov testmultinomial-distributionr

This follows on from the previous question on differences between K-S manual test and K-S test with R.

My frequency sample was

a=c(0,1,1,4,9).

Then the observed sample is

 obs=c(2,3,4,4,4,4,5,5,5,5,5,5,5,5,5)

The expected sample is then

exp=c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)

I hope you agree.

First, I use ks.test, like another time:

ks.test(obs,exp)

data:  oss and att

D = 0.4667, p-value = 0.07626

Then, I use the ks.test the other way:

The expected distribution can be the uniform. Do you agree?

And then:

ks.test(obs, "punif", 0,5)

data:  obs 

D = 0.6667, p-value = 3.239e-06

Question

  • Why do the two approaches give different results?

Best Answer

The first is a two-sample test; the second is a one-sample test against a continuous distribution. Neither is used correctly:

  • The two-sample test views both sets of data as being data, but your "expected sample" is not data, it's a theoretical reference. It is not subject to any variation. The two-sample test thinks that it can vary. That's why the p-value is so large.

  • The reference distribution used in the one-sample test is a continuous uniform distribution between 0 and 5. However, these data look discrete: from the way they are given, it appears they can attain only the values 1, 2, ..., 5. Because the one-sample test doesn't know this, its p-value is probably too small.

At least this lets us infer that the correct p-value should lie somewhere between 0.076 and 3.2e-06. Because that doesn't settle the question, let's analyze further.

To get a sense of whether the data (0, 1, 1, 4, 9) differ significantly from the discrete uniform frequencies (3, 3, 3, 3, 3), view the latter as describing a five-sided die. What are the chances that in 0+1+...+9 = 15 tosses of this die that at least one value would appear 9 or more times? The events (1 appears 9 or more times), (2 appears 9 or more times), ..., (5 appears 9 or more times) are mutually exclusive--no two of them can hold at once--so their probabilities add. Because the die is uniform each of these five events has the same probability. We can compute the chance that a 5 comes up 9 or more times by viewing it like tosses of a biased coin: a 5 has a 1/5 chance; a non-5 has a 4/5 chance. The chance of 9 or more 5's therefore equals

$$\binom{15}{9}(1/5)^9(4/5)^6 + \binom{15}{10}(1/5)^{10}(4/5)^5 + \cdots + \binom{15}{15}(1/5)^{15}(1/4)^0.$$

This value is approximately 0.000785. Multiplying by 5 gives .00392 = 0.39%, still quite small. Thus this set of frequencies is unlikely to have arisen through a single experiment in which each of the values has an equal chance of arising.