Solved – Use of two-sample Kolmogorov-Smirnov test to evaluate similarities between two different distributions

kolmogorov-smirnov testr

$X_1, X_2, \dots X_n$ and $Y_1, Y_2, \dots Y_n; n = 1000$ are two samples of physical quantities coming from the application of two different mathematical models to some independent and identically distributed (iid) data.

The mathematical model used to generate $Y_i%$ is a simplified version (tuned by the parameter $r$) of the mathematical model used to generate $X_i$.

My goal is to find the value of $r$ that makes the empirical distribution $F_Y(y)$ to be as similar as possible to the distribution $F_X(x)$.

I decided to use the Two-Sample Kolmogorov-Smirnov test (in R) for different value of $r$ ranging in a specific interval. Is this choice correct?

I know that the null hypothesis for the K-S test is that the two distributions are the same. However, I know for sure that the two distributions are different because the two mathematical models are different. Is it correct to evaluate the best value of $r$ by looking at the p-value and the D statistic coming from the K-S test?

Best Answer

You say you're trying to make the two distributions close; the $D$ statistic measures the discrepancy between the two, and (as long as it's not changing the dimension of the fit) choosing $r$ to minimize that discrepancy makes complete sense.

I don't think there's any need to deal with the $p$-value; $D$ is a sensible thing to optimize.

If you're changing the dimension of the fit (adding or removing parameters), just minimizing $D$ isn't going to be sufficient, since more parameters will always tend to improve the fit.