Kolmogorov-Smirnov Test – Python Kolmogorov-Smirnov Test: Unusual Results and Interpretation

kolmogorov-smirnov testpythonscipysimilarities

I have difficulties to understand how the Kolmogorov-Smirnov Test works. If I want to know if my samples are from a specific distribution (for example from the weibull distribution) I can compare my significance level to the p-Value I get from scipy.stats. If the p-Value is higher than my chosen alpha (5%) my samples are from the distribution. If p-Value is < 5% they are different.

In this code example I don't understand the result. My sample are from the same distribution i test against and I get a p-Value of 0 which means they are from a different distribution which makes no sense to me.
It would be great if someone could help me out with this.

import scipy.stats as stats
import numpy as np

smapleData = stats.weibull_min.rvs(2.34, loc=0, scale=1, size=10000)
x = np.linspace(0, max(tmp), num=10000, endpoint=True)

stats.kstest(stats.weibull_min.pdf(x, 2.34, loc=0, scale=1), smapleData)

#-> KstestResult(statistic=0.5031, pvalue=0.0)

I read that the KS-test might not be great for large Data. If someone would have an other idea how I can compare to Sample sets (without knowing the distribution behind it) how similar they are I would appreciate it.

Best Answer

You got a couple of things wrong while reading the documentation of the Kolmogorov-Smirnov test.

First you need to use the cumulative distribution function (CDF), not the probability density function (PDF). Second you have to pass the CDF as a callable function, not evaluate it at an equally spaced grid of points. [This doesn't work because the kstest function assumes you are passing along a second sample for a two-sample KS test.]

from functools import partial

import numpy as np
import scipy.stats as stats


# Weibull distribution parameters
c, loc, scale = 2.34, 0, 1
# sample size
n = 10_000

x = stats.weibull_min.rvs(c, loc=loc, scale=scale, size=n)

# One-sample KS test compares x to a CDF (given as a callable function)
stats.kstest(
    x,
    partial(stats.weibull_min.cdf, c=c, loc=loc, scale=scale)
)
#> KstestResult(statistic=0.0054, pvalue=0.9352)

# Two-sample KS test compares x to another sample (here from the same distribution)
stats.kstest(
    x,
    stats.weibull_min.rvs(c, loc=loc, scale=scale, size=n)
)
#> KstestResult(statistic=0.0094, pvalue=0.9291) 

@Dave is correct that with hypothesis testing we don't accept the null hypothesis, we can only reject it or not reject it. The point is that "not reject" is not the same as "accept".

On the other hand, it sounds a bit awkward to say "we have a sample of 10,000 but we simply have insufficient evidence to conclude anything". At this sample size we expect that estimates are precise (have small variance).

Note that this situation is a bit hypothetical. In practice we rarely know the true distribution or that two large samples come from the same distribution as in the simulation. So in the real world, at sample sizes on the order of 10k, it's more likely that the p-value is small, not large.

So do we learn anything if the sample size is large and the p-value is large?

  • We learn that the significance level α = 0.05 doesn't make sense for large data. Keeping α fixed while n grows implies we are looking for smaller and smaller effects.
  • And we learn that — while we cannot accept the null hypothesis as true — the evidence is consistent both with "no effect" and with "trivial effect". If we have chosen the sample size so that we have enough power to detect differences of interest to us, then we also have a good idea what "trivial" means.

You can read more on the topic Are large data sets inappropriate for hypothesis testing?.