Solved – Can you use the Kolmogorov-Smirnov test to directly test for equivalence of two distributions

distributionsequivalencekolmogorov-smirnov testtost

There has been talk on other questions of how one might use the Two One-Sided Tests (TOST) approach for the Kolmogorov-Smirnov (KS) test, but I was wondering whether it was possible to directly use the test statistic to show that two distributions were similar?

As far as I understand it, the KS test statistic represents the biggest difference between two CDFs, with the one-sample version being used originally as a goodness-of-fit test. This is shown in [1] as being when the empirical distribution crosses outside the confidence interval (i.e. any one point is too far from the hypothetical distribution they are testing against).

If the two-sample version is often used to show that two distributions are significantly different to one-another, in a similar way to the one-sample version, can we invert the calculation of the confidence intervals from using $(1-\alpha) = 0.05$ to instead use $(1-\alpha) = 0.95$, as a way of showing that the maximum difference between the two distributions is significantly similar?

[1] Massey, F. "The Kolmogorov-Smirnov test for goodness-of-fit", Journal of the American Statistical Association, vol. 46, no. 253, pp. 68-78, Mar 1951

Best Answer

When conducting the Kolmogorov-Smirnov test, we assume $H_0:$ the two distributions are equivalent. We then calculate a test statistic and, if the corresponding $p$-value is small enough, we reject $H_0$ and conclude $H_A:$ the two distributions are different.

As far as hypothesis tests go, we use a $p$-value to quantify the amount of evidence we have to reject the null hypothesis. A $p$-value of 1 indicates that we have gathered no evidence to reject the null hypothesis. A $p$-value close to 0 indicates there is overwhelming evidence to reject the null hypothesis.

Let's assume we have data and calculate a $p$-value from the K-S test where $p=0.99.$ This indicates there is very little evidence to reject the null hypothesis. However, we cannot establish a standard of $\alpha=0.95$ such that $p>\alpha$ implies that we conclude the null hypothesis is correct. Further, I don't believe there is an alternate test that would allow us to conclude that the two distributions are the same.

What I believe you can do is to be entirely honest in the write-up or discussion. Mention that you ran a K-S test, report a $p$-value, and if the $p$-value is sufficiently high, then articulate that there is very little evidence to suggest that the two distributions are different. So, while you cannot conclude that the distributions are identical, you should be able to note that there is no evidence suggesting that the two distributions are different. As your sample size $n$ increases, the more faith you'll have in this answer.

It's not quite the answer that you were probably looking for, but it's not a total wash, either. Hope this helps!

Related Solutions

Solved – use Kolmogorov-Smirnov to compare two empirical distributions

That is OK, and quite reasonable. It is referred to as the two-sample Kolmogorov-Smirnov test. Measuring the difference between two distribution functions by the supnorm is always sensible, but to do a formal test you want to know the distribution under the hypothesis that the two samples are independent and each i.i.d. from the same underlying distribution. To rely on the usual asymptotic theory you will need continuity of the underlying common distribution (not of the empirical distributions). See the Wikipedia page linked to above for more details.

In R, you can use the ks.test, which computes exact $p$-values for small sample sizes.

Solved – Kolmogorov-Smirnov two-sample test

I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x and y are your two samples, ks.test(x,y) returns the test statistic and pvalue. For example,

> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)    
        Two-sample Kolmogorov-Smirnov test    
data:  x and y 
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided

By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000 in the two-sample case), or you can specify this option with a third argument, exact=F or exact=T. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.

Best Answer

Related Solutions

Solved – use Kolmogorov-Smirnov to compare two empirical distributions

Solved – Kolmogorov-Smirnov two-sample test

Related Question