Kullback–Leibler Divergence – Intuition on Understanding Kullback–Leibler (KL) Divergence

distancedistributionsintuitionkullback-leibler

I have learned about the intuition behind the KL Divergence as how much a model distribution function differs from the theoretical/true distribution of the data. The source I am reading goes on to say that the intuitive understanding of 'distance' between these two distributions is helpful, but should not be taken literally because for two distributions $P$ and $Q$, the KL Divergence is not symmetric in $P$ and $Q$.

I am not sure how to understand the last statement, or is this where the intuition of 'distance' breaks down?

I would appreciate a simple, but insightful example.

Best Answer

A (metric) distance $D$ must be symmetric, i.e. $D(P,Q) = D(Q,P)$. But, from definition, $KL$ is not.

Example: $\Omega = \{A,B\}$, $P(A) = 0.2, P(B) = 0.8$, $Q(A) = Q(B) = 0.5$.

We have:

$$KL(P,Q) = P(A)\log \frac{P(A)}{Q(A)} + P(B) \log \frac{P(B)}{Q(B)} \approx 0.19$$

and

$$KL(Q,P) = Q(A)\log \frac{Q(A)}{P(A)} + Q(B) \log \frac{Q(B)}{P(B)} \approx 0.22$$

thus $KL(P,Q) \neq KL(Q,P)$ and therefore $KL$ is not a (metric) distance.