[Math] KL divergence between two multivariate Bernoulli distribution

information theorymachine learningprobability distributionsstatistics

The KL divergence between two Bernoulli distributions is:

$$
KL(p||q)_{Ber} = p\log\ \frac{p}{q}\ +\ (1-p)\log\ \frac{1-p}{1-q}
$$

According to my understanding, the KL divergence between two multivariate Bernoulli distributions $p$ and $q$ should be

$$
KL(p||q)_{Ber} = \sum_{i=1}^{k} p_i\log\ \frac{p_i}{q_i}\ +\ (1-p_i)\log\ \frac{1-p_i}{1-q_i}
$$
where k is the number of possible outcome.

Is this correct?

Best Answer

I guess you already answered your question, but it's just as you say.

If we assume that the outcomes are independent, we can express the total probability of an arbitrary binary vector as $$p(z)=\prod_{i=1}^kp_i(z_i),$$ where $z_i$ is the $i-$th outcome of $z$ and $p_i(\bullet)$ the distribution of this $i-$th outcome.

The KL divergence between two such distributions is $$D_{KL}(p||q) = \sum_zp(z)\log\frac{p(z)}{q(z)}.$$ Replacing the definition of the distributions and rearranging terms we obtain: $$D_{KL}(p||q) = \sum_{z}\prod_{i=1}^kp_i(z_i)\log\prod_{j=1}^k\frac{p_j(z_j)}{q_j(z_j)} = \sum_{z}\prod_{i=1}^kp_i(z_i)\sum_{j=1}^k\log\frac{p_j(z_j)}{q_j(z_j)}$$ $$= \sum_{j=1}^k\sum_{z}\prod_{i=1}^kp_i(z_i)\log\frac{p_j(z_j)}{q_j(z_j)}.$$

Now, let us use $z_{i\setminus j}$ to denote the vector that results from taking out the $j-th$ component of $z$ and $p_{i\setminus j}(\bullet)$ to denote its corresponding distribution (i.e., $p_{i\setminus j}(z_{i\setminus j})=\prod_{i\neq j}p_i(z_i)$). Then, we can rewrite the previous expression as $$D_{KL}(p||q) = \sum_{j=1}^k\sum_{z_{i\setminus j},z_j}\prod_{i=1}^kp_i(z_i)\log\frac{p_j(z_j)}{q_j(z_j)} = \sum_{j=1}^k\sum_{z_{i\setminus j},z_j}p_{i\setminus j}(z_{i\setminus j})p_j(z_j)\log\frac{p_j(z_j)}{q_j(z_j)}$$ $$= \sum_{j=1}^k\left(\sum_{z_{i\setminus j}}p_{i\setminus j}(z_{i\setminus j})\right)\sum_{z_j}p_j(z_j)\log\frac{p_j(z_j)}{q_j(z_j)}.$$ The term in parenthesis is just $1$ since it is the sum of a distribution over all the space of possible outcomes. So, the expression reduces to $$D_{KL}(p||q) = \sum_{j=1}^k\sum_{z_j}p_j(z_j)\log\frac{p_j(z_j)}{q_j(z_j)}.$$ If we define $p_i:=p_i(z_i=1)$, then the final expression is just what you said $$D_{KL}(p||q) = \sum_{j=1}^kp_j\log\frac{p_j}{q_j} + (1-p_j)\log\frac{1-p_j}{1-q_j}$$

Related Question