Kolmogorov-Smirnov Test – Can You Calculate the Power of a Kolmogorov-Smirnov Test in R?

kolmogorov-smirnov testrstatistical-power

Is it possible to do a power analysis for a 2-sided Kolmogorov Smirnov test in R?

I am testing whether two empirical distributions differ using ks.test(), and am looking to add a power analysis.

I haven't been able to find any built-in power analyses for KS tests in R. Any suggestions?

Edit: These are randomly generated distributions that closely approximate my data (with true sample sizes and estimated decay rates for exponential distributions)

set.seed(100)
x <- rexp(64, rate=0.34)
y <- rexp(54,rate=0.37)

#K-S test: Do x and y come from same distribution?
ks.test(x,y)

These data are measures of body size in two different groups. I want to show that the two groups have essentially the same distribution, but was asked by a collaborator if I had the power to say that based on sample sizes. I've randomly drawn from an exponential distribution here, but these are close to the real data.

So far, I've said that there is no significant difference in these distributions based on the two-sided K-S test. I've also plotted the two distributions. How can I show that I have the power to make such a statement, given the sample sizes and decay rates for x and y?

Best Answer

Finding power against exponential scale-shift alternatives is reasonably straightforward.

However, I don't know that you should be using values calculated from your data to work out what the power might have been. That sort of post hoc power calculation tends to result in counter-intuitive (and often misleading) conclusions.

Power, like significance level, is a population-level phenomenon and one you deal with before the fact; you'd use a priori understanding (including theory, reasoning or any previous studies) to decide on a reasonable set of alternatives to consider, and a desirable effect size

You can also consider a variety of other alternatives (e.g. you could embed the exponential inside a gamma family to consider the impact of more or less skew cases).

The usual questions one might try to answer by a power analysis are:

  1. what's the power, for a given sample size, at some effect size or set of effect sizes*?

  2. given a sample size and power, how large an effect is detectable?

  3. Given a desired power for a particular effect size, what sample size would be required?

*(where here 'effect size' is intended generically, and might be for example, a particular ratio of means, or difference of means, not necessarily standardized).

Clearly you already have a sample size, so you're not in case (3). You might reasonably consider case (2) or case (1).

I'd suggest case (1) (which also gives a way to deal with case (2)).

To illustrate an approach to case (1) and see how it relates to case (2), let's consider a specific example, with:

  • scale shift alternatives

  • exponential populations

  • sample sizes in the two samples of 64 and 54

Because the sample sizes are different, we have to consider the case where the relative spread in one of the samples is both smaller and larger than 1 (if they were the same size, symmetry considerations make it possible to consider just one side). However, because they're quite close to the same size, the effect is very small. In any case, fix the parameter for one of the samples and vary the other.

So what one does is:

Beforehand:

choose a set of scale multipliers representing different alternatives
select an nsim (say 1000)
set mu1=1

To do the calculations:

for each possible scale multiplier, kappa 
  repeat nsim times
    generate a sample of size n1 from Exp(mu1) and n2 from Exp(kappa*mu1)
    perform the test
  compute the rejection rate across nsim tests at this kappa

In R, I did this:

alpha = 0.05
n1 = 54
n2 = 64
nsim = 10000
s = c(1.1,1.2,1.5,2,2.5,3) # set up grid for kappa
s = c(1/rev(s),1,s)        #  also below and at 1
rr = array(NA,length(s))   # to hold rejection rates

for(i in seq_along(s)) rr[i]=mean(replicate(nsim,
                                    ks.test(rexp(n1,1),rexp(n2,s[i]))$p.value)<alpha
                                 )

plot(rr~s,log="x",ylim=c(0,1),type="n") #set up plot
points(rr~rev(s),col=3) # plot the reversed case to show the (tiny) asymmetry+noise
points(rr~s,col=1) # plot the "real" case last 
abline(h=alpha,col=8,lty=2) # draw in alpha

which gives the following power "curve"

enter image description here

The x-axis is on a log-scale, the y-axis is the rejection rate.

It's hard to tell here, but the black points are slightly higher on the left than the right (that is, there's fractionally more power when the larger sample has the smaller scale).

Using the inverse normal cdf as a transformation of the rejection rate, we can make the relationship between the transformed rejection rate and log kappa (kappa is s in the plot, but the x-axis is log-scaled) very nearly linear (except near 0), and the number of simulations was high enough that the noise is very low -- we can just about ignore it for present purposes.

So we can just use linear interpolation. Shown below are approximate effect sizes for 50% and 80% power at your sample sizes:

enter image description here

The effect sizes on the other side (larger group has smaller scale) are only slightly shifted from that (can pick up a fractionally smaller effect size), but it makes little difference, so I won't labor the point.

So the test will pick up a substantial difference (from a ratio of scales of 1), but not a small one.


Now for some comments: I don't think hypothesis tests are particularly relevant to the underlying question of interest (are they quite similar?), and consequently these power calculations don't tell us anything directly relevant to that question.

I think you address that more useful question by prespecifying what you think "essentially the same" actually means, operationally. That - pursued rationally to a statistical activity - should lead to meaningful analysis of the data.