Proof of convexity with logs

convex-analysisentropylogarithmsprobability distributions

I'm trying to prove the following result. Let $p$ be an input probability distribution and $Q$ be a transition matrix. $q = Qp$ is a valid output probability distribution. The components of $p$ and $q$ are given as $p_i$ and $q_i$ respectively.

For any $p, p'$, define

$$J(p, p') = -D(p||p') + \sum_ip_i\left(\sum_{j}Q_{ij}\log(Q_{ij})\right) – \sum_{j}q_j\log q'_j,$$

where $q'_j = \sum_k Q_{jk}p'_k$ and $D(p||p') = \sum_i p_i\log \frac{p_i}{p'_i}$, the divergence.

The claims are the following

1) $J$ is concave in both $p$ and $p'$. I can see this for $p$ quite easily and numerically, it seems to be true for $p'$ but how can I prove it? Somehow, I need to combine the concave term $-D(p||p')$ and the convex $-\log$ term together to make a concave function but I don't know how.

2) $J(p, p') \leq I$, where $I$ is the mutual information between probability distributions $p$ and $q$, maximized over all possible input distributions $p$. It is given by

$$I = \max_{p}\left[\sum_ip_i\left(\sum_{j}Q_{ij}\log(Q_{ij})\right) – \sum_{j}q_j\log q_j\right]$$

Again, I have verified this numerically over random choices of $Q$ but have no proof.

EDIT: For the concavity problem, it follows from what I posted before but is easier to analyze. We require the function

$$\sum_i p(i)\log p'(i) – \sum_j q_j\log q'(j)$$

is concave in $p'$ for any choice of stochastic matrix $Q$ that gives us $q_k = \sum_j Q_{kl}p_l$ and $q'_k = \sum_j Q_{kl}p'_l$.

Best Answer

First, for convenience rewrite $J(p,p')$ as \begin{align} J(p,p') &= -D(p\Vert p')+\sum_{i}p_{i}\sum_{j}Q_{ij}\log\frac{Q_{ij}}{q_{j}}+\sum_{j}q_{j}\log\frac{q_{j}}{q'_{j}} \\ &= I({p})-\left[D(p\Vert p')-D(q\Vert q')\right]\\ & = I(p) - \Delta D(p\Vert p') \end{align} where I have used your definition of $I(p)$, and defined \begin{align} \Delta D(p\Vert p')= D(p\Vert p')-D(q\Vert q') \end{align}

$\Delta D(p\Vert p')$ reflects the "contraction" of Kullback-Leibler divergence between $p$ and $p'$ under the stochastic matrix $Q$. $\Delta D$ is convex in the first argument. To see why, consider any two distributions $p$ and $\bar{p}$, and define the convex mixture $p^\alpha = (1-\alpha) p + \alpha \bar{p}$. We will show convexity by demonstrating that the second derivative with respect to $\alpha$ is non-negative.

First, compute the first derivative w.r.t. $\alpha$ as \begin{align*} & {\textstyle \frac{d}{d\alpha}}\Delta D(p^{\alpha}\Vert q) ={\textstyle \frac{d}{d\alpha}}\left[\sum_{i}p_{i}^{\alpha}\log p_{i}^{\alpha}-\sum_{i}p_{i}^{\alpha}\sum_{j}Q_{ij}\log q_{j}^{\alpha}+\sum_{i}p_{i}^{\alpha}\sum_{j}Q_{ij}\log\frac{q_{j}'}{p'_{i}}\right]\\ & =\sum_{i}(\bar{p}_{i}-p_{i})\log p_{i}^{\alpha}-\sum_{i}(\bar{p}_{i}-p_{i})\sum_{j}Q_{ij}\log q_{j}^{\alpha}+\sum_{i}(\bar{p}_{i}-p_{i})\sum_{j}Q_{ij}\log\frac{q_{j}'}{p'_{i}} \end{align*} Then, we compute the second derivative at $\alpha=0$ as \begin{align} {\textstyle \frac{d^{2}}{d\alpha^{2}}}\Delta D(p^{\alpha}\Vert q)\vert_{\alpha=0}=\sum_{i}\frac{\left(\bar{p}_{i}-p_{i}\right)^{2}}{p_{i}}-\sum_{j}\frac{\left(\bar{q}_{j}-q_{j}\right)^{2}}{q_{j}} \end{align} $\sum_{i}\frac{\left(\bar{p}_{i}-p_{i}\right)^{2}}{p_{i}}$ is the so-called $\chi^2$ divergence from $p$ to $\bar{p}$, and $\sum_{j}\frac{\left(\bar{q}_{j}-q_{j}\right)^{2}}{q_{j}}$ is the same once $Q$ is applied to $p$ and $\bar{p}$. Note that $\chi^2$ divergence is a special case of a $f$-divergence, and therefore obeys a data-processing inequality (see e.g. Liese and Vajda, IEEE Trans on Info Theory, 2006, Thm. 14). In particular, that means that ${\textstyle \frac{d^{2}}{d\alpha^{2}}}\Delta D(p^{\alpha}\Vert q)\vert_{\alpha=0} \ge 0$.

At the same time, $\Delta D(p\Vert p')$ is not convex in the second argument. Consider $p = (0.5,0.5,0)$, $q=(0.5,0.25,0.25)$, and $Q = \left( \begin{smallmatrix} 0.95 & 0.05 \\ 1 & 0 \\ 0 & 1\end{smallmatrix} \right)$. Here is a plot of $\Delta D$ where the first argument is $p$ and the second argument is a convex mixture of between $p$ and $q$, $\alpha p + (1-\alpha) q$, for different $\alpha$:

(see code at https://pastebin.com/q8XLnGK8)

By visual inspection, it can be verified $\Delta D(p\Vert p')$ is neither convex nor concave in the second argument.

Regarding your questions:

(1) $I({p})$ is known to be concave in $p$ (Theorem 2.7.4 in Cover and Thomas, 2006). As we've shown $\Delta D(p\Vert p')$ is convex in $p$, so $-\Delta D(p\Vert p')$ is concave in $p$. Since the sum of concave functions is concave, $J(p,p') = I(p) - \Delta D(p\Vert p')$ is concave in $p$.

At the same time, as a function of the second argument $p'$, $J(p,p') = \mathrm{const} - \Delta D(p\Vert p')$, and we've shown above that $\Delta D(p\Vert p')$ is neither convex nor concave in the second argument. Thus, $J(p,p')$ is not concave in the second argument.

(2) $\Delta D(p\Vert p') \ge 0$ by the data processing inequality for KL divergence (Csiszar and Körner, 2011, Lemma 3.11). That means that \begin{align} J(p,p') = I(p) - \Delta D(p\Vert p') \le I(p) , \end{align} from which $J(p,p') \le \max_s I(s)$ follows immediately.

Best Answer

Related Solutions

An inequality related to the Renyi divergence

Averaging of probability distribution increases entropy

Related Question