R – Two-Sample Kolmogorov-Smirnov Test P-Value Confusion

kolmogorov-smirnov testp-valuer

I'm confused about the appropriate interpretation of p-values returned by the two-sample Kolmogorov-Smirnov test (ks.test) in R.

In slide 23 of this presentation about non-parametric two-sample tests, the author states that when analyzing the ks.test results:

ks.test(male, female)
Two-sample Kolmogorov-Smirnov test
data: male and female 
D = 0.8333, p-value = 0.02597

the p-value

needs to be multiplied by 2 for a 2-tail test. Thus, P = 0.05194

Is that true?

If we used the original p = 0.02597, we would reject the hypothesis that the distributions similar, because p < 0.05, correct? Whereas if we multiply it by 2, the p would suggest that there is no difference between distributions, since p > 0.05?

What am I missing?

Best Answer

No, it's wrong. The default Kolmogorov-Smirnov in R is already two sided (i.e. already tests $F_X\neq F_Y$ rather than $F_X<F_Y$ or $F_X>F_Y$ (in all three cases, we should add "somewhere").

If you had done a one-tailed test but intended to do a two tail test (and if the sample turned out to have a difference in the direction you tested for), it's usually reasonably-near-to-correct to double the p-value for a two-tailed test, but strictly speaking, still wrong.

While in the case of the t-test the events of rejecting in each tail are mutually exclusive - so you can just add their probabilities, and symmetric so adding is doubling - for the Kolmogorov-Smirnov they're not mutually exclusive -- each of the one-tailed Kolmogorov-Smirnov tests can reject on the same sample. However, under the null it's relatively rare to be able to reject both directions and so it's generally not a bad approximation to double.

It's just unnecessary, since the ks.test function will happily calculate two-tailed p-values for us without doing a thing -- in fact we have to explicitly ask for a one-tailed one.