Solved – Is the Kolmogorov-Smirnov-Test too strict if the sample size is large

kolmogorov-smirnov testlarge datastatistical significance

I often hear the statement that the KS-Test (for comparing two distributions described by two samples) is too strict if the sample size is rather large, meaning that the 0-hypothesis of equal distributions is rejected too often.

In my application I am given 2 samples of 1500 observations each and a low P-value for the given KS-statistic. Neverthless my data looks similar if I plot histograms/density estimates.

My question: can we make the statement KS-test being too strict more rigorous (can we state a threshold for the number of observations?). Is it "true" at all? Is there any reference?

Thank you!
PS: I know this looks like a dublicate but checking the first suggestions by the site I didn't see a clear dublicate. If this is a dublicate please refer me to it ! 🙂

EDIT: I add some background. The distributions that I want to compare are explanatory variables in a logistic regression model. It was developed on some sample (A) and I want to apply it to another sample (B). I can not test the model as I can don't know the outcome on B. One approach that people use here to assess whether the model gives reasonable results on B is to test the variables for similar distributions (KS-test, Chi-squared).

Is there a better approach to assess whether it makes statistically sense (speaking of discriminatory power) to apply a model developed on A to B without knowing the results on B?

Best Answer

With a test you try to find deviations from the Null hypothesis. The larger the sample the better we are at detecting such deviations, even trivially small ones. So if you do test in large samples you will reject the null hypothesis quite often due to substantively trivial deviations. This is what many people mean when they say that statistical tests reject the null too often in large samples.

Strictly speaking they are wrong: the test correctly answered the question the user posed, it is just that the question the user posed was not the question (s)he wanted to ask... But try devise a testing procedure for the hypothesis: Two distributions are equal ignoring substantively trivial differences. We (humans) can decide what is substantively trivial, a procedure like statistical testing cannot.