[Math] Generalizing inequality relating Euclidean distance & Frobenius norm to Bregman divergences such as relative entropy & von Neumann divergence

convexityinequalitiesit.information-theorylinear algebramatrices

Motivation- A Special Case

Supposing $A,B\in\mathbb{S}^{m\times m}$ are symmetric positive semi-definite (SPD) matrices and $\mathbf{x}\in\mathbb{R}^m$ is a unit vector where $\|\mathbf{x}\|=1$, we found that the squared Euclidean distance of two quadratic forms $\left(\mathbf{x}^\top A\mathbf{x}-\mathbf{x}^\top B\mathbf{x}\right)^2$ is bounded by the squared Frobenius norm of difference of the two matrices $\|A-B\|_F^2$.

Denoting the spectral decomposition of $A-B$ as $A-B=W\Phi W^\top$ where $\Phi=\mathrm{diag}\left(\phi_1,\phi_2,\ldots,\phi_m\right)$ is a diagonal matrix of eigenvalues, we have
\begin{eqnarray}
&&\left(\mathbf{x}^\top A\mathbf{x}-\mathbf{x}^\top B\mathbf{x}\right)^2
=\left(\mathbf{x}^\top(A-B)\mathbf{x}\right)^2
=\left(\mathbf{x}^\top W\Phi W^\top\mathbf{x}\right)^2\\\
=&&\left(\sum_i{x_{W,i}^2\phi_i}\right)^2
\leq\max_i{\phi_i^2}\leq\sum_i{\phi_i^2}
=\|A-B\|_F^2
\end{eqnarray}

where $W^\top\mathbf{x}=\mathbf{x}_W=\left[x_{W,1}\;x_{W,2}\;\ldots\;x_{W,m}\right]^\top$ and $\mathbf{x}_W^\top\mathbf{x}_W=\sum_i{x_{W,i}^2}=1$.

Therefore, for $\forall \mathbf{x}\in\mathbb{R}^m\;\mathrm{s.t.}\;\|\mathbf{x}\|=1$, we have $\left(\mathbf{x}^\top A\mathbf{x}-\mathbf{x}^\top B\mathbf{x}\right)^2\leq\|A-B\|_F^2$.

Question- Could this be generalized?

However, the squared Euclidean distance is a special case of Bregman divergence
$$
D_\varphi(\mathbf{x},\mathbf{y})=\varphi(\mathbf{x})-\varphi(\mathbf{y})-\nabla\varphi(\mathbf{y})^\top(\mathbf{x}-\mathbf{y})
$$

where $\varphi$ is the convex seed function.

On the other hand, the squared Frobenius norm of difference of two matrices is a special case of Bregman matrix divergence
$$
D_\phi(A,B)=\phi(A)-\phi(B)-\mathrm{tr}\left((\nabla\phi(B))^\top(A-B)\right)
$$

where $\phi(A)=(\varphi\circ\lambda)(A)$ is a compound matrix function in which $\lambda$ is the function that lists the eigenvalues of $A$ and $\varphi$ is the convex seed function.

In the example above, the seed function is $\varphi(\mathbf{x})=\mathbf{x}^\top\mathbf{x}$ and we can rewrite the inequality as
$$
D_\varphi\left(\mathbf{x}^\top A\mathbf{x},\mathbf{x}^\top B\mathbf{x}\right)
\leq D_\phi(A,B)
$$

where $\|\mathbf{x}\|=1$ and $\phi=\varphi\circ\lambda$. The function $\lambda$ lists the eigenvalues of the matrix argument.

With the property of Bregman matrix divergence, the inequality can also be written as
\begin{eqnarray}
D_\varphi\left(\mathbf{x}^\top\mathbf{V}\Lambda\mathbf{V}^\top\mathbf{x},
\mathbf{x}^\top\mathbf{U}\Theta\mathbf{U}^\top\mathbf{x}\right)
&=&D_\varphi\left(\sum_i(\mathbf{v}_i^\top\mathbf{x})^2\lambda_i,\sum_j(\mathbf{u}_j^\top\mathbf{x})^2\theta_j\right)\\\
&\leq&\sum_i\sum_j{(\mathbf{v}_i^\top\mathbf{u}_j)^2D_\varphi(\lambda_i,\theta_j)}
\end{eqnarray}

where $A=V\Lambda V^\top,B=U\Theta B^\top$ are spectral decompositions and $\mathbf{v}_i,\mathbf{u}_j$ are columns of $V,U$ respectively.

My Question is: can this inequality be extended to general Bregman divergence and Bregman matrix divergence with different seed functions chosen?

Or under what condition such an inequality exists?

For example, if $\varphi(\mathbf{x})=\sum_i{x_i\log x_i-x_i}$,

then $D_\varphi$ is the relative entropy (KL-divergence)
$$\mathrm{KL}(\mathbf{x},\mathbf{y})=\sum_i\left(x_i(\log x_i-\log y_i)-x_i+y_i\right),$$

and $D_\phi$ is the von Neumann divergence
$$D_{vn}(A,B)=\mathrm{tr}\left(A\log A-A\log B-A+B\right).$$

In this case, does the following inequality holds for $\forall\mathbf{x}\in\mathbb{R}^m$ satisfying $\|\mathbf{x}\|=1$?
$$
\mathrm{KL}\left(\mathbf{x}^\top A\mathbf{x},\mathbf{x}^\top B\mathbf{x}\right)
\leq D_{vN}(A,B)
$$

I did many experiments about this inequality about relative entropy and von Neumann divergence with random generalized SPD matrices using Matlab and it always holds.
However, does it really hold?

Could anyone please give me some help for this question or recommend some relevant papers? Any suggestion will be appreciated. Thank you very much!

Best Answer

I found a proof of this problem for the case of $\varphi(x)=\sum_i{x_i\log x_i-x_i}$. If you find there is anything mistake in the proof, please let me know. Thank you!

The case of $\varphi(x)=\sum_i{x_i\log x_i-x_i}$ can be proved with the method similar to Lindblad, Completely positive maps and entropy inequalities, 1975 and Lindblad, Expectations and entropy inequalities for finite quantum systems, 1974. The inequality can be strengthened as $$ \sum_i{\mathrm{KL}\left(\mathbf{x}_i^\top A\mathbf{x}_i,\mathbf{x}_i^\top B\mathbf{x}_i\right)}\leq D_{vN}\left(A,B\right) $$

Actually, a very similar result has been proposed in some papers about quantum information theory, such as the two papers referred above. The referred result is that for any trace preserving map $\Phi$, given by $\Phi(A)=\sum_{i=1}^n{V_iAV_i^\top}$ and $\sum_{i=1}^n{V_i^\top V_i}=1$, we have that $\mathrm{tr}\left(\Phi(A),\Phi(B)\right)\leq D_\phi(A,B)$, where $A,B$ are both density operators which are Hermitian positive definite matrices satisfying $\mathrm{tr}A=\mathrm{tr}B=1$ and $\varphi(x)=x\log x$.

We found that if the trace constraints $\mathrm{tr}A=\mathrm{tr}B=1$ are dropped and $\varphi(x)=x\log x$ is replaced with $\varphi(x)=x\log x-x$, the inequality still holds.

The proof is outlined as following:

  1. The von Neumann divergence has the following additivity property with Kronecker product: $$D_{vN}(A\otimes P,B\otimes P)=D_{vN}(A,B)\cdot\mathrm{tr}P$$

  2. Using the joint convexity and the additivity, we can prove that the von Neumann divergence has the monotonicity with partial trace as \begin{equation*} \begin{split} D_{vN}(\mathrm{tr}_2(A),\mathrm{tr}_2(B)) =&D_{vN}\left(\mathrm{tr}_2(A)\otimes\frac{\mathbf{I}_2}{m}, \mathrm{tr}_2(B)\otimes\frac{\mathbf{I}_2}{m}\right) /\mathrm{tr}\left(\frac{\mathbf{I}_2}{m}\right)\\\ =&D_{vN}\left(\sum_{j=1}^N{p_jW_jAW_j^+},\sum_{j=1}^N{p_jW_jBW_j^+}\right)\\\ \leq&\sum_{j=1}^{N}{p_jD_{vN}\left(W_jAW_j^+,W_jBW_j^+\right)}\\\ =&D_{vN}(A,B)\end{split} \end{equation*}

  3. For any trace preserving map $\Phi$, given by $\Phi(A)=\sum_{i=1}^n{V_iAV_i^\top}$ and $\sum_{i=1}^n{V_i^\top V_i}=1$, it can be represented as a unitary operation+partial tracing. Therefore, we have that \begin{equation*} \begin{split} D_{vN}\left(\Phi(A),\Phi(B)\right) =&D_{vN}\left(\mathrm{tr}_2(W(A\otimes\mathbf{s}\mathbf{s}^\top)W^\top), \mathrm{tr}_2(W(B\otimes\mathbf{s}\mathbf{s}^\top)W^\top)\right)\\\ \leq&D_{vN}\left(W(A\otimes\mathbf{s}\mathbf{s}^\top)W^\top, W(B\otimes\mathbf{s}\mathbf{s}^\top)W^\top\right)\\\ =&D_{vN}\left(A,B\right) \end{split} \end{equation*}

  4. Then the sum of relative entropy of the quadratic forms could be represented as matrix divergence and bounded. \begin{equation*} \begin{split} \sum_i{\mathrm{KL}\left(\mathbf{x}_i^\top A\mathbf{x}_i,\mathbf{x}_i^\top B\mathbf{x}_i\right)} =&\sum_{i,j}{(\mathbf{x}_i^\top\mathbf{x}_j)^2 \mathrm{KL}(\mathbf{x}_i^\top A\mathbf{x}_i,\mathbf{x}_j^\top B\mathbf{x}_j)}\\\ =&D_{vN}(\sum_i{X_iAX_i^\top},\sum_i{X_iBX_i^\top})\\\ \leq&D_{vN}\left(A,B\right) \end{split} \end{equation*} where $X_i=\mathbf{x}_i\mathbf{x}_i^\top$.

If there is any mistake in the proof, please let me know. Any other suggestions are also welcomed. Thank you very much!

Related Question