Trace of product of many matrices related by unitary rotation

eigenvalues-eigenvectorslinear algebramatricestrace

Suppose $\Lambda$ is a finite-dimensional $n \times n$ real diagonal matrix with positive entries on the diagonal. I'm interested in the quantity
$$
S = {\rm Tr}[U_1 \Lambda U_1^\dagger\, U_2 \Lambda U_2^\dagger\, \cdots\, U_m \Lambda U_m^\dagger]
$$

for finite $m$, where each $U_i$ is an arbitrary $n \times n$ complex unitary matrix.

My question is: is it true that $\text{Re}(S) \leq {\rm Tr}[\Lambda^m]$, i.e. is it maximal when all $\{U_i\}$ are equal? If so, does anything change if $\Lambda$ is non-negative rather than positive?

I have been able to prove that this is true for $m = 2$ by using this answer and setting $D = \Lambda$ and $X = U_1^\dagger U_2 \Lambda U_2^\dagger U_1$. To me it seems intuitive that it would hold for all $m$, but I haven't been able to generalize. Any help would be much appreciated!

N.B. If it's easier, I could assume that $U_i$ are real orthogonal rather than complex unitary. But the more general case would be preferred.

Best Answer

where $\Sigma_X$ is a diagonal matrix with the singular values for $X$ with ordering $\sigma_1\geq \sigma_2\geq...\geq \sigma_n$.
For purposes of OP's problem we only need to consider invertible matrices.

The desired inequality is:
$\text{real}\Big(\text{trace}\big(A_1A_2...A_m\big)\Big) \leq \Big\vert \text{trace}\big(A_1A_2...A_m\big)\Big\vert \leq \text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big) \leq \text{trace}\Big(\Sigma_{A_1}^m\Big)$
where the first inequality comes from the fact for that complex number $a+bi$ we have $a \leq \vert a\vert \leq \vert a\vert+\vert b\vert$, the second inequality is von Neumann Trace Inequality / using Polar Decomposition, and the final inequality is proven below.

core result:
a result due to Horn (1950), ref page 338 of Marshall, Olkin, and Arnold's "Inequalities..."

for invertible matrices $A$ and $B$
$\log\big(\Sigma_{AB}\big)\preceq \log\big(\Sigma_{A}\big)+\log\big(\Sigma_{B}\big)$
where $\preceq$ denotes (strong) majorization. Note the equality case-- why the majorization is strong not weak-- i.e. the fact that $\sum_{k=1}^n \log(\sigma_{AB}) = \sum_{k=1}^n \big(\log(\sigma_{A})+\log(\sigma_{B})\big)$ should be clear: just exponentiate each side and this is equivalent to $\det\Big(\Sigma_{AB}\Big) =\Big \vert \det\big(AB\big)\big \vert =\Big \vert \det\big(A\big)\big \vert\Big \vert \det\big(B\big)\big \vert = \det\Big(\Sigma_{A}\Big)\det\Big(\Sigma_{B}\Big)$

OP's Problem has
$A_k:= U_k\Lambda U_k^*$ so each $A_k$ has the same singular values and we need to prove
$\text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big) \leq \text{trace}\Big(\Sigma_{A_1}^m\Big)$
to do this we go through the following main claim about log-majorization

main claim:
$\log\big(\Sigma_{A_1A_2...A_m}\big)\preceq \log\big(\Sigma_{A_1}\big)+\log\big(\Sigma_{A_2}\big)+...+\log\big(\Sigma_{A_m}\big)= m \cdot \log\big(\Sigma_{A_1}\big)= \log\big(\Sigma_{A_1}^m\big)$
we defer the main claim proof momentarily and note what this implies:
For $u\in \mathbb R$ the function $u\mapsto e^u$ is (strictly) convex so for real diagonal matrix $D$ the function $g$ given by composing the trace with the exponential function i.e. $g\Big(D\Big) =\text{trace}\Big(\exp\big(D\big)\Big) = \sum_{k=1}^n e^{d_{k,k}}$ is a Schur Convex function hence

$\log\big(\Sigma_{A_1A_2...A_m}\big)\preceq \log\big(\Sigma_{A_1}^m\big)$
$\implies \text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big)= g\Big(\log\big(\Sigma_{A_1A_2...A_m}\big)\Big)\leq g\Big(\log\big(\Sigma_{A_1}^m\big)\Big) =\text{trace}\Big(\Sigma_{A_1}^m\Big)$
and completes the proof of the core result.

proof main claim:
Proceeding by induction.
Base Case:
$\log\big(\Sigma_{A_1A_2}\big)\preceq \log\big(\Sigma_{A_1}\big)+\log\big(\Sigma_{A_{2}}\big)$
by the result from Horn at top.

Inductive Case:
for natural numbers $m\geq 3$
we know
$ \log\big(\Sigma_{A_1A_2...A_{m-1}}\big)\preceq \sum_{k=1}^{m-1}\log\big(\Sigma_{A_k}\big)$
by inductive hypothesis
and need to prove
$ \log\big(\Sigma_{A_1A_2...A_{m-1}A_m}\big)\preceq \sum_{k=1}^{m}\log\big(\Sigma_{A_k}\big)$

combining the Horn result (with bunching) and Inductive Hypothesis we have
(i) $ \log\big(\Sigma_{A_1A_2...A_{m-1}A_m}\big)= \log\big(\Sigma_{(A_1A_2...A_{m-1})A_m}\big) \preceq \log\big(\Sigma_{A_1A_2...A_{m-1}}\big)+\log\big(\Sigma_{A_{m}}\big)$ and
(ii) $ \log\big(\Sigma_{A_1A_2...A_{m-1}}\big)\preceq \sum_{k=1}^{m-1}\log\big(\Sigma_{A_k}\big)$

due to notational issues, I'm going to re-write this in vector notation as
(i) $\mathbf x \preceq \mathbf y + \mathbf z_m$ and
(ii) $\mathbf y \preceq \sum_{k=1}^{m-1}\mathbf z_k$
$\implies \mathbf x\preceq \big(\sum_{k=1}^{m-1}\mathbf z_k\big)+\mathbf z_m=\big(\sum_{k=1}^{m}\mathbf z_k\big)$
(keeping with the convention for singular value ordering, the vector components are ordered from largest to smallest)

The implication is almost immediate. Writing out the definition for majorization we have for $r\in \big\{1,2,..,m\big\}$
(i) $\sum_{i=1}^r x_{i} \leq \big(\sum_{i=1}^r y_i \big) +\big(\sum_{i=1}^r z^{(m)}_i\big)$
(ii) $\big(\sum_{i=1}^r y_i \big) \leq \sum_{k=1}^{m-1}\big(\sum_{i=1}^r z^{(k)}_i\big)$
$\implies \sum_{i=1}^r x_{i} \leq \sum_{k=1}^{m-1}\big(\sum_{i=1}^r z^{(k)}_i\big) +\big(\sum_{i=1}^r z^{(m)}_i\big)$
and we recover
$\mathbf x\preceq \big(\sum_{k=1}^{m}\mathbf z_k\big)$ i.e. we've proven
$ \log\big(\Sigma_{A_1A_2...A_{m-1}A_m}\big)\preceq \sum_{k=1}^{m}\log\big(\Sigma_{A_k}\big)$

(Technically the above only shows weak majorization but the fact that it is strong is immediate -- either the earlier determinant argument will do it, or note that (i) says $\text{sum}\big(\mathbf x\big)= \text{sum}\big(\mathbf y + \mathbf z_m\big) = \text{sum}\big(\mathbf y\big) + \text{sum}\big(\mathbf z_m\big)$ and (ii) says $\text{sum}\big(\mathbf y\big) = \text{sum}\big(\sum_{k=1}^{m-1}\mathbf z_k\big) = \sum_{k=1}^{m-1}\text{sum}\big(\mathbf z_k\big)$, and combining these gives the equality case.)

other questions in the OP:

is it maximal when all $\{U_i\}$ are equal?

Yes. The above proves this. Note: this is a sufficient case, not a necessary condition. E.g. $\Lambda =\delta I$ always meets the equality case. However for $\Lambda \not \propto I$ this is unique in the sense that we may always choose a collection of unitary matrices where the inequality is strict. To prove this one need only consider the group of permutation matrices $S_n\subset U_n$, choosing $P_k:= I$ for $k\in\big\{1,2,...,m-1\big\}$ and letting $P_m$ be the permutation matrix (that is a type 2 elementary matrix) which swaps the $1$st and $m$th elements of $\Lambda$ which must be distinct. Application of the re-arrangement inequality tells us the upper bound is strict.

If so, does anything change if $\Lambda$ is non-negative rather than positive

No. The above argument in log-space would appear to require all positive singular values, but we can use continuity to work around this. I.e. suppose $\Lambda$ is singular and diagonal, PSD. Fixing some $m$, consider for $j \in \mathbb N$

$A_k^{(j)} := A_k + \frac{1}{2^j}I$ which is PD for all $j$.

Running the above argument on $A_k^{(j)}$ for arbitrary $j$ tells us

$\text{trace}\Big(\Sigma_{A_1^{(j)}A_2^{(j)}...A_m^{(j)}}\Big) \leq \text{trace}\Big(\Sigma_{A_1^{(j)}}^m\Big)$
and using the continuity of the trace we take a limit of this sequence, i.e.
$\text{trace}\Big(\Sigma_{A_1A_2...A_m}\Big)=\lim_{j\to \infty}\text{trace}\Big(\Sigma_{A_1^{(j)}A_2^{(j)}...A_m^{(j)}}\Big) \leq \lim_{j\to \infty}\text{trace}\Big(\Sigma_{A_1^{(j)}}^m\Big)=\text{trace}\Big(\Sigma_{A_1}^m\Big)$
(and the strictness argument runs almost verbatim albeit in a block matrix form, attacking the non-singular block.)