That is OK, and quite reasonable. It is referred to as the two-sample Kolmogorov-Smirnov test. Measuring the difference between two distribution functions by the supnorm is always sensible, but to do a formal test you want to know the distribution under the hypothesis that the two samples are independent and each i.i.d. from the same underlying distribution. To rely on the usual asymptotic theory you will need continuity of the underlying common distribution (not of the empirical distributions). See the Wikipedia page linked to above for more details.
In R, you can use the ks.test
, which computes exact $p$-values for small sample sizes.
I am assuming you are asking because the Suanshu help page reports in reference to the K-S distribution, "This is not done yet." Luckily, it is very easy to do in R. If x
and y
are your two samples, ks.test(x,y)
returns the test statistic and pvalue. For example,
> x <- rnorm(50)
> y <- runif(30)
> ks.test(x, y)
Two-sample Kolmogorov-Smirnov test
data: x and y
D = 0.5, p-value = 9.065e-05
alternative hypothesis: two-sided
By default, it will compute exact or asymptotic p-values based on the product of the sample sizes (exact p-values for n.x*n.y < 10000
in the two-sample case), or you can specify this option with a third argument, exact=F
or exact=T
. Exact p-values are calculated using the methods of Marsaglia, et al. (2003), which the Suanshu documentation also cites. Some large sample approximations are given here, although I don't have a proper citation. Lastly, if you don't want to install R, there are web calculators for the two-sample K-S test, although I don't know if they use the same algorithm as R because the one I found only reported three decimal points for the p-value.
Best Answer
A 2004 article On a new multivariate two-sample test by Baringhaus and Franz maybe helpful, they provided a brief literature review on the two-sample multivariate GoF tests and then a R package
cramer
. As the package name suggested their method is related to Cramer's test, a predecessor of Cramer-von Mises.For one-sample problem Justel et al. developed a generalization of Kolmogorov-Smirnov test. In general it seems the difficulty in multivariate case rooted from extending the definition of EDF (empirical distribution function), so methods based on other measures are worth exploring, e.g. multivariate tests based on ECF (empirical characteristic function) by Fan.