The norm $\|S-Q\|_F$ where $Q$ is orthogonal is minimised by $Q=I$

inequalitylinear algebramatrix-normspositive-semidefinitesvd

Problem:

Suppose that $S$ is symmetric and semi-positive-definite. Let $\|\cdot \|_F$ be the Frobenius norm. Show that

$$\|S-I \|_F \leq \|S-Q\|_F$$

for all orthogonal matrices $Q$, where $I$ is the identity matrix.


Attempt:

So from what I know, the Frobenius norm is defined as

$$\|A\|_F:= \bigg(\sum_{i,j} a_{ij}^2\bigg)^{1/2}$$

and one of the properties of it is that $\|A\|_F = \big(\sum_i \sigma_i^2 \big)^{1/2}$ where $\sigma_i$ are the Singular Values of $A$.

Also, $\|QA\|_F = \|AQ\|_F = \|A\|_F$ for any orthogonal matrix $Q$.

Thus, if we consider the Singular Value Decomposition of $S$, say $S=UDV$ where $U,V$ are orthogonal and $D = \text{diag}(\sigma_1,\dots,\sigma_n)$, then

$$\|S-I\|_F = \|UDV – I\|_F = \|D – U^TV^T\|_F$$

but I feel like I'm not getting anyhere with this approach.

$\color{red}{\text{In particular,}}$ I don't really know how to use the fact that $S$ is symmetric and semi-positive-definite. Does this have any effect on the form of the SVD for $S$?

Any help would be much appreciated. Thanks!

Best Answer

Use the fact that $\|A\|_F^2 = \operatorname{Tr}(A^TA)$. In particular, we have $$ \|S-Q\|_F^2 = \operatorname{Tr}([S-Q]^T[S-Q]) = \|S\|_F^2 + n^2 - 2 \operatorname{Tr}(Q^TS) $$ where $n$ denotes the common size of the matrices $S,Q$. With that said, it's clear that we want an orthogonal matrix $Q$ that maximizes $\operatorname{Tr}(Q^TS)$.

Because $S$ is symmetric and positive semidefinite, there exists an orthogonal matrix $V$ and a non-negative diagonal matrix $D$ with $S = VDV^T$ (regarding your question about using the properties of $S$: note that this is an SVD). Note that $$ \operatorname{tr}(Q^TVDV^T) = \operatorname{tr}([V^TQ^TV]D) = \operatorname{tr}([V^TQV]^TD). $$ In other words, we want an orthogonal matrix $W = V^TQV$ that maximizes $\operatorname{tr}(W^TD)$. We see that this maximum is achieved with $W = I$. In particular, the entries of an orthogonal matrix must all be at most $1$. So, we have $$ \operatorname{tr}(W^TD) = \sum_{i=1}^n w_{ii}d_{ii} \leq \sum_{i=1}^n d_{ii} = \operatorname{tr}(D) = \operatorname{tr}(I^TD). $$ Note that the only $Q$ for which $W = I = V^TQV$ is given by $Q = I$. The conclusion follows.

Note: This problem is an instance of the orthogonal procrustes problem.

Related Question