Solved – Jensen-Shannon Divergence for multiple probability distributions

distributionsentropyinformation theorykullback-leiblerpython

What is the correct mathematical expression for computing Jensen-Shannon divergence ($JSD$) between multiple probability distributions?

I found the following expression on Wikipedia, but I did not find any official reference:

$$JSD(p^1,…,p^m)=H\left(\frac{p^1+…+p^m}{m}\right)-\frac{\sum_{j=1}^{m} H(p^j)}{m}$$

where $H$ is the Shannon-entropy.

  • What is the intuition of divergence in multiple distribution case?
  • In the case of two univariate distributions $P$ & $Q$, it was interpreted as a distance between them $JSD(P||Q)$. So in the multi-distribution case, is it measuring the distance between two multivariate distributions?
  • Is there an efficient Python code available for it?

Best Answer

I found one paper that uses the JS-Divergence of multiple distributions to estimate the hardness of a query (in the area of Information Retrieval). The paper can be found here and they themselves refer a paper called "Divergence measures based on the shannon entropy (1991)". They also give a little bit different mathematical expression for it.

As for the interpretation, they explain it as follows:

Given a set of distributions thus obtained, we employ the well known Jensen-Shannon divergence [8] to measure the diversity of the distributions corresponding [...] So the Jensen-Shannon divergence can be seen to measure the overall diversity between all the probability distributions.

As for the Python code, I couldn't find any package that implements the JSD for more than two distributions. But there is already one quite straightforward code example on crossvalidated (see here)

.