Solved – KL divergence invariant to affine transformation

kullback-leiblermathematical-statisticsnormal distribution

I read in this tutorial on page 20 that $KL$ divergence is invariant to affine transformation, but I think it is incorrect.

Say we have two 1D normal distributions $P_{1}(x) = \mathcal N(\mu_{1}, \sigma_{1})$ and $P_{2}(x) = \mathcal N(\mu_{2}, \sigma_{2})$. So that $$KL(P_1(x)\|P_{2}(x))= E_{1}(\ln \frac{P_{1}(x)}{P_{2}(x)}) = \ln(\frac{\sigma_{2}}{\sigma_{1}}) + \frac{1}{2\sigma_2^2}(\sigma_1^2+(\mu_1-\mu_2)^2)-\frac{1}{2}$$

If we define an affine transformation as $$x^{'} = \mu_1 + \frac{1}{\sigma}(x – \mu_1)$$

We will have
$$P_1(x^{'}) = \sigma P_1(x = \mu_1+ \sigma(x' – \mu_1)) = \mathcal N(\mu_1, \frac{\sigma_1^2}{\sigma^2})$$ and
$$P_2(x^{'}) = \sigma P_2(x = \mu_1+ \sigma(x' – \mu_1)) = \mathcal N(\mu_1-\frac{1}{\sigma}(\mu_1-\mu_2), \frac{\sigma_2^2}{\sigma^2})$$
Then, the $KL$ divergence for the two transformed distributions is $$KL(P_1(x')\|P_2(x')) = E'_1(\ln \frac{P_1(x')}{P_2(x')}) = \ln (\frac{\sigma_{2}}{\sigma_{1}}) + \frac{1}{2\sigma_2^2}(\sigma^2 \sigma_1^2+(\mu_1-\mu_2)^2)-\frac{\sigma^2}{2}$$

So clearly, for such a simple case $KL$ divergence is not invariant.

However, $KL$ divergence is invariant under affine transformation is crucial for the proof in the tutorial that I referred to.

So, have I misunderstood something?

EDIT:

I think part of my misunderstanding lies in the way that I calculate $P_1(x')$ and $P_2(x')$. So I will expand this part so others can see where I got it wrong.
$$P_1(x') = \sigma P_1(x) = \sigma P_1(\mu_1+\sigma (x'-\mu_1))$$
given that $$P_1(x)=\mathcal N(\mu_1, \sigma_1)$$
so,
$$\sigma P_1(\mu_1+\sigma (x'-\mu_1)) = \sigma \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{1}{2\sigma_1^2}(\sigma (x' – \mu_1))^2} = \frac{1}{\sqrt{2\pi} \frac{\sigma_1}{\sigma}} e^{-\frac{1}{2\frac{\sigma_1^2}{\sigma^2}}((x' – \mu_1))^2} = \mathcal N(\mu_1, \frac{\sigma_1^2}{\sigma^2})$$
Then in the exact the same way, I have $$P_2(x^{'}) = \sigma P_2(x = \mu_1+ \sigma(x' – \mu_1)) = \mathcal N(\mu_1-\frac{1}{\sigma}(\mu_1-\mu_2), \frac{\sigma_2^2}{\sigma^2})$$

Is there any problem with this?

Best Answer

There are a few mistakes in your math. For example, when you expand the expectation, it seems you dropped the integral and also the $P_1(x)$ term.

Write $y(x) = mx + c$. Recall that $P(x) dx = P(y) dy$. This is easy to see since $dy/dx = m$ and it makes sense that $P(x) = mP(y)$.

Then we can go through with this proof from wikipedia which shows KL is invariant: enter image description here