Kolmogorov-Smirnov Test – How to Use Multivariate Two-Sample Kolmogorov-Smirnov Test

hypothesis testingkolmogorov-smirnov testmultivariate analysis

Is there a multivariate alternative to two-sample Kolmogorov–Smirnov test? What I mean is a test that can be used to check whenever two underlying multidimensional distributions differ.

Best Answer

A 2004 article On a new multivariate two-sample test by Baringhaus and Franz maybe helpful, they provided a brief literature review on the two-sample multivariate GoF tests and then a R package cramer. As the package name suggested their method is related to Cramer's test, a predecessor of Cramer-von Mises.

For one-sample problem Justel et al. developed a generalization of Kolmogorov-Smirnov test. In general it seems the difficulty in multivariate case rooted from extending the definition of EDF (empirical distribution function), so methods based on other measures are worth exploring, e.g. multivariate tests based on ECF (empirical characteristic function) by Fan.

Related Solutions

Hypothesis Testing – Use Kolmogorov-Smirnov to Compare Two Empirical Distributions

That is OK, and quite reasonable. It is referred to as the two-sample Kolmogorov-Smirnov test. Measuring the difference between two distribution functions by the supnorm is always sensible, but to do a formal test you want to know the distribution under the hypothesis that the two samples are independent and each i.i.d. from the same underlying distribution. To rely on the usual asymptotic theory you will need continuity of the underlying common distribution (not of the empirical distributions). See the Wikipedia page linked to above for more details.

In R, you can use the ks.test, which computes exact $p$-values for small sample sizes.

Kolmogorov-Smirnov Test – How to Perform a Kolmogorov-Smirnov Two-Sample Test

I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x and y are your two samples, ks.test(x,y) returns the test statistic and pvalue. For example,

> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)    
        Two-sample Kolmogorov-Smirnov test    
data:  x and y 
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided

By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000 in the two-sample case), or you can specify this option with a third argument, exact=F or exact=T. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.

Best Answer

Related Solutions

Hypothesis Testing – Use Kolmogorov-Smirnov to Compare Two Empirical Distributions

Kolmogorov-Smirnov Test – How to Perform a Kolmogorov-Smirnov Two-Sample Test

Related Question