Averaging of probability distribution increases entropy

entropyinformation theory

Let $A\in\mathbb{R}^{n\times n}$ be a doubly stochastic matrix i.e. $a_{ij}\geq 0\;\forall i,j$ and $\sum_i a_{ij}=1\;\forall i$ and $\sum_j a_{ij}=1\;\forall j$. Let $p_1,\dots p_n$ be set of probability. Define new probability set as follows
$$p_i'=\sum_j a_{ij}p_j\;\forall i$$
Show that
$$H(p')\geq H(p)$$
where, $H(p)\triangleq-\sum_i p_i\log(p_i)$

I tried to use Jensen's inequality on $H(p')=-\sum_i \left(\sum_j a_{ij}p_j\right)\log\left(\sum_k a_{ik}p_k\right)$ but that doesn't yield any meaningful progress.

Best Answer

$$\begin{align} H(p')&=-\sum_i\left(\sum_j a_{ij}p_j\right)\log\left(\sum_k a_{ik}p_k\right)\\ &\leq -\sum_i\sum_j a_{ij}p_j\log\left(p_j\right)\quad\because\text{Jensen's inequality}\\ &=H(p)\sum_i a_{ij}\\ &=H(p)\quad\because\text{Doubly stochastic matrix} \end{align}$$

Related Solutions

Proof of convexity with logs

First, for convenience rewrite $J(p,p')$ as \begin{align} J(p,p') &= -D(p\Vert p')+\sum_{i}p_{i}\sum_{j}Q_{ij}\log\frac{Q_{ij}}{q_{j}}+\sum_{j}q_{j}\log\frac{q_{j}}{q'_{j}} \\ &= I({p})-\left[D(p\Vert p')-D(q\Vert q')\right]\\ & = I(p) - \Delta D(p\Vert p') \end{align} where I have used your definition of $I(p)$, and defined \begin{align} \Delta D(p\Vert p')= D(p\Vert p')-D(q\Vert q') \end{align}

$\Delta D(p\Vert p')$ reflects the "contraction" of Kullback-Leibler divergence between $p$ and $p'$ under the stochastic matrix $Q$. $\Delta D$ is convex in the first argument. To see why, consider any two distributions $p$ and $\bar{p}$, and define the convex mixture $p^\alpha = (1-\alpha) p + \alpha \bar{p}$. We will show convexity by demonstrating that the second derivative with respect to $\alpha$ is non-negative.

First, compute the first derivative w.r.t. $\alpha$ as \begin{align*} & {\textstyle \frac{d}{d\alpha}}\Delta D(p^{\alpha}\Vert q) ={\textstyle \frac{d}{d\alpha}}\left[\sum_{i}p_{i}^{\alpha}\log p_{i}^{\alpha}-\sum_{i}p_{i}^{\alpha}\sum_{j}Q_{ij}\log q_{j}^{\alpha}+\sum_{i}p_{i}^{\alpha}\sum_{j}Q_{ij}\log\frac{q_{j}'}{p'_{i}}\right]\\ & =\sum_{i}(\bar{p}_{i}-p_{i})\log p_{i}^{\alpha}-\sum_{i}(\bar{p}_{i}-p_{i})\sum_{j}Q_{ij}\log q_{j}^{\alpha}+\sum_{i}(\bar{p}_{i}-p_{i})\sum_{j}Q_{ij}\log\frac{q_{j}'}{p'_{i}} \end{align*} Then, we compute the second derivative at $\alpha=0$ as \begin{align} {\textstyle \frac{d^{2}}{d\alpha^{2}}}\Delta D(p^{\alpha}\Vert q)\vert_{\alpha=0}=\sum_{i}\frac{\left(\bar{p}_{i}-p_{i}\right)^{2}}{p_{i}}-\sum_{j}\frac{\left(\bar{q}_{j}-q_{j}\right)^{2}}{q_{j}} \end{align} $\sum_{i}\frac{\left(\bar{p}_{i}-p_{i}\right)^{2}}{p_{i}}$ is the so-called $\chi^2$ divergence from $p$ to $\bar{p}$, and $\sum_{j}\frac{\left(\bar{q}_{j}-q_{j}\right)^{2}}{q_{j}}$ is the same once $Q$ is applied to $p$ and $\bar{p}$. Note that $\chi^2$ divergence is a special case of a $f$-divergence, and therefore obeys a data-processing inequality (see e.g. Liese and Vajda, IEEE Trans on Info Theory, 2006, Thm. 14). In particular, that means that ${\textstyle \frac{d^{2}}{d\alpha^{2}}}\Delta D(p^{\alpha}\Vert q)\vert_{\alpha=0} \ge 0$.

At the same time, $\Delta D(p\Vert p')$ is not convex in the second argument. Consider $p = (0.5,0.5,0)$, $q=(0.5,0.25,0.25)$, and $Q = \left( \begin{smallmatrix} 0.95 & 0.05 \\ 1 & 0 \\ 0 & 1\end{smallmatrix} \right)$. Here is a plot of $\Delta D$ where the first argument is $p$ and the second argument is a convex mixture of between $p$ and $q$, $\alpha p + (1-\alpha) q$, for different $\alpha$:

(see code at https://pastebin.com/q8XLnGK8)

By visual inspection, it can be verified $\Delta D(p\Vert p')$ is neither convex nor concave in the second argument.

Regarding your questions:

(1) $I({p})$ is known to be concave in $p$ (Theorem 2.7.4 in Cover and Thomas, 2006). As we've shown $\Delta D(p\Vert p')$ is convex in $p$, so $-\Delta D(p\Vert p')$ is concave in $p$. Since the sum of concave functions is concave, $J(p,p') = I(p) - \Delta D(p\Vert p')$ is concave in $p$.

At the same time, as a function of the second argument $p'$, $J(p,p') = \mathrm{const} - \Delta D(p\Vert p')$, and we've shown above that $\Delta D(p\Vert p')$ is neither convex nor concave in the second argument. Thus, $J(p,p')$ is not concave in the second argument.

(2) $\Delta D(p\Vert p') \ge 0$ by the data processing inequality for KL divergence (Csiszar and Körner, 2011, Lemma 3.11). That means that \begin{align} J(p,p') = I(p) - \Delta D(p\Vert p') \le I(p) , \end{align} from which $J(p,p') \le \max_s I(s)$ follows immediately.

Derivative of Renyi entropy

Your derivative is not correct, in order to avoid mistake with $\log_2$ when differentiating I always transform it to $\ln$ : \begin{align*} &\frac{d}{d\alpha}\left(\frac{1}{1-\alpha}\log\sum_i p_i^\alpha\right) \\ =& \log(e)\frac{d}{d\alpha}\left(\frac{1}{1-\alpha}\ln\sum_i p_i^\alpha\right)\\ =&\log(e)\left(\frac{1}{(1-\alpha)^2}\ln\sum_i p_i^\alpha + \frac{1}{1-\alpha}\frac{1}{\sum_j p_j^\alpha}\sum_ip_i^\alpha \ln(p_i)\right)\\ =&\frac{1}{(1-\alpha)^2}\log\sum_i p_i^\alpha + \frac{1}{1-\alpha}\frac{1}{\sum_j p_j^\alpha}\sum_ip_i^\alpha \log(p_i)\\ =& \frac{1}{(1-\alpha)^2}\left[ \frac{\sum_{i} p_i^\alpha}{\sum_{i} p_i^\alpha} \log\sum_{k} p_k^\alpha + \frac{1}{\sum_j p_j^\alpha}\sum_ip_i^\alpha (1-\alpha)\log(p_i) \right]\\ =&\frac{1}{(1-\alpha)^2\sum_{i} p_i^\alpha} \sum_{i} p_i^\alpha \left(\log\sum_{k} p_k^\alpha + (1-\alpha)\log(p_i)\right) \\ =&\frac{1}{(1-\alpha)^2\sum_{i} p_i^\alpha} \sum_{i} p_i^\alpha \log\left(p_i^{1-\alpha}\cdot \sum_{k} p_k^\alpha\right)\\ =&\frac{1}{(1-\alpha)^2\sum_{i} p_i^\alpha} \sum_{i} p_i^\alpha \left(\log\sum_{k} p_k^\alpha + (1-\alpha)\log(p_i)\right) \\ =&\frac{1}{(1-\alpha)^2} \sum_{i} \frac{p_i^\alpha}{\sum_{j} p_j^\alpha} \log\left(\frac{\sum_{k} p_k^\alpha}{p_i^{\alpha-1}}\right) \end{align*}

I think you are missing a minus sign from the wikipedia page, that will flip the $\log$ and give you what you want.

Best Answer

Related Solutions

Proof of convexity with logs

Derivative of Renyi entropy

Related Question