Solved – Jensen-Shannon divergence for bivariate normal distributions

distance-functionsinformation theorynormal distribution

Given two bivariate normal distributions $P \equiv \mathcal{N}(\mu_p, \Sigma_p)$ and $Q \equiv \mathcal{N}(\mu_q, \Sigma_q)$, I am trying to calculate the Jensen-Shannon divergence between them, defined (for the discrete case) as:
$JSD(P\|Q) = \frac{1}{2} (KLD(P\|M)+ KLD(Q\|M))$
where $KLD$ is the Kullback-Leibler divergence, and $M=\frac{1}{2}(P+Q)$
I've found the way to calculate $KLD$ in terms of the distributions' parameters, and thus $JSD$.

My doubts are:

To calculate $M$, I just did $M \equiv \mathcal{N}(\frac{1}{2}(\mu_p + \mu_q), \frac{1}{2}(\Sigma_p + \Sigma_q))$. Is this right?
I've read in [1] that the $JSD$ is bounded, but that doesn't appear to be true when I calculate it as described above for normal distributions. Does it mean I am calculating it wrong, violating an assumption, or something else I don't understand?

Best Answer

The midpoint measure $\newcommand{\bx}{\mathbf{x}} \newcommand{\KL}{\mathrm{KL}}M$ is a mixture distribution of the two multivariate normals, so it does not have the form that you give in the original post. Let $\varphi_p(\bx)$ be the probability density function of a $\mathcal{N}(\mu_p, \Sigma_p)$ random vector and $\varphi_q(\bx)$ be the pdf of $\mathcal{N}(\mu_q, \Sigma_q)$. Then the pdf of the midpoint measure is $$ \varphi_m(\bx) = \frac{1}{2} \varphi_p(\bx) + \frac{1}{2} \varphi_q(\bx) \> . $$

The Jensen-Shannon divergence is $$ \mathrm{JSD} = \frac{1}{2} (\KL(P\,\|M)+ \KL(Q\|M)) = h(M) - \frac{1}{2} (h(P) + h(Q)) \>, $$ where $h(P)$ denotes the (differential) entropy corresponding to the measure $P$.

Thus, your calculation reduces to calculating differential entropies. For the multivariate normal $\mathcal{N}(\mu, \Sigma)$, the answer is well-known to be $$ \frac{1}{2} \log_2\big((2\pi e)^n |\Sigma|\big) $$ and the proof can be found in any number of sources, e.g., Cover and Thomas (1991), pp. 230-231. It is worth pointing out that the entropy of a multivariate normal is invariant with respect to the mean, as the expression above shows. However, this almost assuredly does not carry over to the case of a mixture of normals. (Think about picking one broad normal centered at zero and another concentrated normal where the latter is pushed out far away from the origin.)

For the midpoint measure, things appear to be more complicated. That I know of, there is no closed-form expression for the differential entropy $h(M)$. Searching on Google yields a couple potential hits, but the top ones don't appear to give closed forms in the general case. You may be stuck with approximating this quantity in some way.

Note also that the paper you reference does not restrict the treatment to only discrete distributions. They treat a case general enough that your problem falls within their framework. See the middle of column two on page 1859. Here is where it is also shown that the divergence is bounded. This holds for the case of two general measures and is not restricted to the case of two discrete distributions.

The Jensen-Shannon Divergence has come up a couple of times recently in other questions on this site. See here and here.

Addendum: Note that a mixture of normals is not the same as a linear combination of normals. The simplest way to see this is to consider the one-dimensional case. Let $X_1 \sim \mathcal{N}(-\mu, 1)$ and $X_2 \sim \mathcal{N}(\mu, 1)$ and let them be independent of one another. Then a mixture of the two normals using weights $(\alpha, 1-\alpha)$ for $\alpha \in (0,1)$ has the distribution $$ \varphi_m(x) = \alpha \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{(x+\mu)^2}{2}} + (1-\alpha) \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2}} \> . $$

The distribution of a linear combination of $X_1$ and $X_2$ using the same weights as before is, via the stable property of the normal distribution is $$ \varphi_{\ell}(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-(1-2\alpha)\mu)^2}{2\sigma^2}} \>, $$ where $\sigma^2 = \alpha^2 + (1-\alpha)^2$.

These two distributions are very different, though they have the same mean. This is not an accident and follows from linearity of expectation.

To understand the mixture distribution, imagine that you had to go to a statistical consultant so that she could produce values from this distribution for you. She holds one realization of $X_1$ in one palm and one realization of $X_2$ in the other palm (though you don't know which of the two palms each is in). Now, her assistant flips a biased coin with probability $\alpha$ out of sight of you and then comes and whispers the result into the statistician's ear. She opens one of her palms and shows you the realization, but doesn't tell you the outcome of the coin flip. This process produces the mixture distribution.

On the other hand, the linear combination can be understood in the same context. The statistical consultant merely takes both realizations, multiplies the first by $\alpha$ and the second by $(1-\alpha)$, adds the result up and shows it to you.

Related Solutions

Information Theory – Calculating Jensen-Shannon Divergence for Three Distributions

There is mistake in the mixture distribution. It should be $(5/18, 28/90, 37/90)$ instead of $(1/6, 1/5, 9/30)$ which does not sum up to 1. The entropy (with natural log) of that is 1.084503. Your other entropy terms are wrong.

I will give the detail of one computation:

$$H(1/2,1/2,0) = -1/2*\log(1/2) - 1/2*\log(1/2) + 0 = 0.6931472$$

In a similar way, the other terms are 0.325083 and 1.098612. So the final result is 1.084503 - (0.6931472 + 0.325083 + 1.098612)/3 = 0.378889

Solved – Distance measure between two multivariate normal distributions (with differing mean and covariances)

In the end I went for the Bhattacharyya distance. I adapted the R code referenced here:

// In the following, Vec3 and Mat3 are C++ Eigen types.

/// See: https://en.wikipedia.org/wiki/Mahalanobis_distance
double mahalanobis(const Vec3& dist, const Mat3& cov)
{
    return (dist.transpose()*cov.inverse()*dist).eval()(0);
}

/// See: https://en.wikipedia.org/wiki/Bhattacharyya_distance
double bhattacharyya(const Vec3& dist, const Mat3& cov1, const Mat3& cov2)
{
    const Mat3 cov = (cov1+cov2)/2;
    const double d1 = mahalanobis(dist, cov)/8;
    const double d2 = log(cov.determinant()/sqrt(cov1.determinant()*cov2.determinant()))/2;
    return d1+d2;
}

Best Answer

Related Solutions

Information Theory – Calculating Jensen-Shannon Divergence for Three Distributions

Solved – Distance measure between two multivariate normal distributions (with differing mean and covariances)

Related Question