[Math] KL divergence between Bernoulli Distribution with parameter $p$ and Gaussian Distribution

real-analysisstatistics

I am trying to find the Kullback–Leibler divergence between Bernoulli Distribution on two points $T, -T$ with parameter $p$ and Gaussian Distribution with mean $\mu$ and variance $\sigma^2$. My attempt is as follows:

Let
$$
b(x) = q\delta(x-T)+p\delta(x+T) \sim \text{Bernoulli}(p) \\ g(x) \sim N(\mu, \sigma^2).
$$

$$
\begin{align}
D(b||g) &= \int_{-\infty}^{\infty}b(x)\log \left( \frac{b(x)}{g(x)}\right) dx \\
&=\int_{-\infty}^{\infty}b(x)\log \left( b(x) \right) dx – \int_{-\infty}^{\infty}b(x)\log \left( g(x) \right) dx \\
&=A-B
\end{align}
$$

My questions are as follows:

  1. Can I use the continuous representation of Bernoulli RV with the help of $\delta(.)$ functions where $\delta(.)$ is Dirac Delta function?
  2. Does $A$ exist? Because, on the set $\mathbb{R}-\{\pm T\}$, $\log(\delta(x \mp T))$ is $-\infty$.
  3. If we cannot calculate the KLD between a continuous and a discrete random variable, what is the KLD analogue for this case? My thought was that $B$ alone can serve as a distance. For example, if we want to measure the distance of $b(x)$ from two different Gaussian distributions $g_1(x), g_2(x)$, only $B$ depends on $g_1(x)$ or $g_2(x)$, and thus can contribute to KLD.

Best Answer

  1. No, you cannot do this. The Kullback-Leibler divergence $D_{KL}(P\|Q)$ is defined only if $P\ll Q$. This means that no set of positive $P$-measure can have zero $Q$-measure. In your case it will not work because the point masses of the Bernoulli distribution have zero measure under the Gaussian distribution.
  2. The integral A blows up. For a continuous distribution this would the negative of the continuous version of the Shannon entropy.
  3. I suspect that you might be looking for the mutual information between parameter space and observation space. It is a common technique to try to maximize the mutual information in such settings. The mutual information is then equal to the expected Kullback-Leibler divergence of the posterior distribution on a parameter space (given the observations) from the prior distribution. Here, the requirement that $P\ll Q$ simply means that one is not allowed to make any conclusions that are a priori impossible!
Related Question