Solved – When KL Divergence and KS test will show inconsistent results

hypothesis testingkolmogorov-smirnov testkullback-leibler

I know that Kullback–Leibler divergence and Kolmogorov–Smirnov test are different and should be used in different scenarios. But they are similar in many ways and given two distributions, we could calculate their KL divergence in terms of bits and p-value under KS test (and there are also other metrics like Jensen–Shannon divergence and many other hypothesis testing methods. But Let's just talk about KL divergence and KS test here.)

My question is: Under what circumstances, that KS test will provide a very small p-value but KL divergence will give us a very small distance? What is the intuition behind it? It would be better if there could be any concrete examples.

Best Answer

Set aside Kullback-Leibler divergence for a moment and consider the following: it's perfectly possible for the Kolmogorov-Smirnov p-value to be small and for the corresponding Kolomogorov-Smirnov distance to be small.

Specifically, that can easily happen with large sample sizes, where even small differences are still larger than we'd expect to see from random variation.

The same will naturally tend to happen when considering some other suitable measure of divergence and comparing it to the Kolmogorov-Smirnov p-value - it will quite naturally occur at large sample sizes.

[If you don't wish to confound the distinction between Kolmogorov-Smirnov distance and p-value with the difference in what the two things are looking at, it might be better to explore the differences in the two measures ($D_{KS}$ and $D_{KL}$) directly, but that's not what is being asked here.]

Intuition

Kullback-Leibler Divergence can be interpreted to mean

how many bits of information we expect to lose is we use $Q$ instead of $P$.

Thus the Population Stability Index is the "roundtrip loss":

how many bits of information we expect to lose is we use $Q$ instead of $P$ and then use that again to go back to $Q$.

Values

It appears that the Population Stability Index is closely related to the G-test:

$$ \mathrm{PSI}(P,Q) = \frac{G(P,Q) + G(Q,P)}{2N} $$

(and thus can be computed using scipy.stats.power_divergence, as well as directly).

Therefore the p-values corresponding to PSI can be computed using the $\chi^2$ distribution:

import scipy.stats as st
print("            ","     ".join("DF=%d" % (df) for df in [1,2,3]))
for psi in [0.1, 0.25]:
    print "PSI=%.2f  %s" % (psi, "".join(
        " %5f" % (st.distributions.chi2.sf(psi,df)) for df in [1,2,3]))

               DF=1       DF=2       DF=3
PSI=0.10     0.751830   0.951229   0.991837
PSI=0.25     0.617075   0.882497   0.969140

Here PSI is the Population Stability Index and DF is the number of degrees of freedom ($\mathrm{DF}=n-1$ where $n$ is the number of distinct values that the variable takes).

Interestingly enough, the official "interpretation" of the PSI value completely ignores DF.

Best Answer

Related Solutions

Solved – Is the square root of the symmetric Kullback-Leibler divergence a metric

Solved – the intuition behind the Population Stability Index

Intuition

Values

Related Question