Statistics – Additive Property of Kullback-Liebler Divergence

divergence-operatorstatistics

I was looking in KL divergence lemma, but could not figure out how they derive the additive property of KL divergence.
For reference KL divergence which is a measure of difference between two probability distributions $P$ and $Q$ is defined as $D_{KL}(P||Q)=\sum_i \log{\frac{P(i)}{Q(i)}}P(i)$, where $i$ is the support of $P$ and $Q$.Now as mentioned in wikipedia.

The Kullback–Leibler divergence is additive for independent distributions in much the same way as Shannon entropy. If ${P_{1},P_{2}} $ are independent distributions, with the joint distribution $P(x,y)=P_{1}(x)P_{2}(y)$ , and $Q,Q_{1},Q_{2}$ likewise, then $D_{KL}(P||Q)=D_{KL}(P_1||Q_1)+D_{KL}(P_2||Q_2)$

If I split up the $log$ in the definition I get the following
$D_{KL}(P||Q)=D_{KL}(P_1||Q_1)\sum P_2 +D_{KL}(P_2||Q_2)\sum P_1$

Best Answer

Remember that if you sum up a probability mass function for all it's values, then the result has to be one (it is an axiom of the probability distributions), so you have $D_{KL}(P||Q)=D_{KL}(P_1||Q_1)\sum P_2+D_{KL}(P_2||Q_1)\sum P_1$, as I stated before: $\sum P_1=1$ and $\sum P_2=1$, so then: $D_{KL}(P||Q)=D_{KL}(P_1||Q_1)\sum P_2+D_{KL}(P_2||Q_1)\sum P_1=D_{KL}(P_1||Q_1)+D_{KL}(P_2||Q_2)$, so there you have the additive property.