[Math] Summation of singular values

linear algebramatricessingular valuestrace

If $A \in \mathbb{R}^{m\times n}$, then show that $$\sum_{s=1}^{r}\sigma_s(A)=\text{max}\{\text{trace}(U^TAV): U \in \mathbb{R}^{m\times r}, V\in \mathbb{R}^{n\times r}, \text{and} \ U^TU=V^TV=I_r\}.$$

Where $\sigma_1(A)\geq\sigma_2(A)\geq…\geq\sigma_n(A)$ are the singular values of $A$.

My Idea: If we write the singular value decomposition $A=U_A\Sigma V_A^T$, then we have $$\text{trace}(U^TAV)=\text{trace}(U^TU_A\Sigma V_A^TV)=\text{trace}(\Sigma V_A^TVU^TU_A)=\sum_{s=1}^{r}\sigma_s(A)c_{ss}$$
where $c_{ss}$ is the $(s,s)$ entry of $V_A^TVU^TU_A$. If $V_A^TVU^TU_A$ is unitary then $c_{ss}\leq1$, from which we get the result. However we only know that $V^TV=I$ but $VV^T\neq I$. Similar for other matrices.

I couldn't show that it is unitary. Any help would be appreciated.

Best Answer

The basic fact is that if $W^T$ is an $r\times n$ matrix with orthogonal rows and $S=\operatorname{diag}(s_1,s_2,\ldots,s_n)$ has nonnegative and decreasing diagonal entries, the maximum of $\|W^TS\|_F$ is attained when $W^T=\pmatrix{I_r&0}$. This can be proved by the usual argument. Let $\Lambda=\operatorname{diag}(\lambda_1,\lambda_2,\ldots,\lambda_n)=I_r\oplus0$. Expand $W$ to an orthogonal matrix $Q$. Then $\|W^TS\|_F^2=\operatorname{tr}(\Lambda Q^TS^2Q)=\sum_{i,j}s_i^2\lambda_jq_{ij}^2$, which is a linear function in the entries of matrix $Q\circ Q$ (the entrywise square of $Q$). As $Q\circ Q$ is doubly stochastic, Birkhoff's Theorem dictates that the maximum value of $\sum_{i,j}s_i^2\lambda_jq_{ij}^2$ occurs at some permutation matrix $Q$. Since both $\Lambda$ and $S$ have nonnegative and decreasing diagonal entries, $Q=I_n$ is clearly a global maximiser. Therefore $W^T=\pmatrix{I_r&0}$ is a global maximiser.

Now, in your problem, by absorbing $U_A$ and $V_A^T$ into $U^T$ and $V$ respectively, we may assume that $A=\Sigma=\pmatrix{D&0}$ is already a singular value matrix. Therefore \begin{align} \max_{U,V}\operatorname{tr}(U^TAV) &=\max_{U,V}\operatorname{tr}\left(U^T\pmatrix{D&0}V\right)\\ &=\max_{U,V}\operatorname{tr}\left(U^TD^{1/2}\pmatrix{D^{1/2}&0_{m\times(n-m)}}V\right)\\ &\le\max_{U,V}\left\|U^TD^{1/2}\right\|_F \left\|\pmatrix{D^{1/2}&0_{m\times(n-m)}}V\right\|_F\tag{1}\\ &=\max_{U,V}\left\|U^TD^{1/2}\right\|_F \left\|\pmatrix{D^{1/2}&0_{m\times(n-m)}\\ 0_{(n-m)\times m}&0_{(n-m)\times(n-m)}}V\right\|_F.\tag{2} \end{align} By our aforementioned fact, the two Frobenius norms in $(2)$ attain maxima when $U^T=\pmatrix{I_r&0_{r\times(m-r)}}$ and $V^T=\pmatrix{I_r&0_{r\times(n-r)}}$. However, if we put $D_r=\operatorname{diag}(\sigma_1,\ldots,\sigma_r)$, we have $D^{1/2}U=\pmatrix{D^{1/2}&0_{m\times(n-m)}}V=\pmatrix{D_r^{1/2}\\ 0_{n\times r}}$. Therefore tie also occurs in the Cauchy-Schwarz inequality $(1)$. Hence this $(U,V)$ is a global maximiser and the maximum possible value of $\operatorname{tr}(U^TAV)$ is $\|D_r^{1/2}\|_F^2=\sum_{i=1}^r\sigma_i(A)$.

Related Question