[Math] Why the singular values have to appear in descending order across the diagonal matrix

linear algebralinear-transformationssvd

This question is about Singular Value Decomposition.

Given an arbitrary matrix $A$

$$A \in \mathbb{R}^{m\times n} $$

Its reduced SVD form is:-

$$A = U\Sigma V^T$$

Where as $\Sigma$ is the diagonal matrix containing scaling factors across its diagonal:-

$$ \Sigma = \left( \begin{array}{ccccc}
\sigma_1 & \hfill & \hfill & \hfill & \hfill \\
\hfill & \sigma_2 & \hfill & \hfill & \hfill \\
\hfill & \hfill & \ddots &\hfill & \hfill \\
\hfill & \hfill & \hfill & \hfill & \sigma_n \\
\end{array} \right)$$

Now, it is said that the singular values across the diagonal are such that:-

$$\sigma_1 \geq \sigma_2 \geq \sigma_3 \geq \cdots \geq \sigma_n \geq 0 $$

Why do these values have to be necessarily in descending order across the diagonal? I read that it's a convention to write so, but there is more to the order than just convention. If this order is changed, we will get a totally different matrix instead of $A$. So the order is important.

Or is it so that there is always guaranteed to be one unique solution for SVD in which these singular values across $\Sigma$ are guaranteed to be in descending order?

Best Answer

It is just convention. If you change the order, and permute the columns of $U$ and $V$ correspondingly, you do get the same matrix $A$.

Related Solutions

[Math] Singular values: smallest perturbation to make a matrix singular

I believe the largest singular value is the value of the 2-norm, but how can we relate this to the operator norm?

The largest singular value is the operator norm: $$\sigma_1=\max_{\|x\|=1}\|Ax\|=\|A\| \tag0$$ Here and below I use the Euclidean norm on vectors, and the corresponding operator norm on matrices.

The smallest singular value of $A$, denoted by $\sigma_n$ below, has a description similar to (0): $$\sigma_n=\min_{\|x\|=1}\|Ax\|=\min\{\|A-B\|: \operatorname{rank}B<n \} \tag1$$ Indeed, from the SVD decomposition you see that $$\min_{\|x\|=1}\|Ax\|^2 = \min_{\|y\|=1} \sum_k \sigma_k^2 y_k^2 = \sigma_n^2+ \min_{\|y\|=1} \sum_k (\sigma_k^2-\sigma_n^2) y_k^2$$ where the last sum is minimized by letting $y_n$ be the only nonzero component. I used the fact that the unitaries preserve the vector norm.

By the way, you can prove (0) by modifying the proof above.

As for the second part of (1), the inequality $$\min_{\|x\|=1}\|Ax\|\le \min\{\|A-B\|: \operatorname{rank}B<n \}\tag2$$ follows from the fact that $Bx=0$ for some unit vector $x$. The reverse inequality follows by taking $B=A\circ P$ where $P$ is the projection onto the orthogonal complement of vector $x$ that attains the minimum on the left side of (2). Indeed, $A-A\circ P=A\circ Q$ where $Q$ is the projection onto the span of $x$, hence $\|A\circ Q\|=\|Ax\|$.

In particular, (1) implies that $\sigma_n(A)$ is a $1$-Lipschitz function of $A$ in the operator norm, namely $$|\sigma_n(A)-\sigma_n(B)|\le \|A-B\| \tag3$$ Where the right hand side is $\sigma_1(A-B)$, by (0).

In fact, (3) holds for all singular values, because $\sigma_k$ is the distance from $A$ to the set of operators of rank less than $k$. See Wikipedia: Singular value.

Linear Algebra – Investigating Conditions on Eigenvalues Under Orthogonal Multiplication

I assume $\sigma_1\neq 0$, which means that $\Sigma$ is invertible. Let $A=Q\Sigma^{-1}.$ Then $Q=A\Sigma$ and $Q^T=\Sigma A^T.$

We want $Q^T \Sigma$ to be symmetric, which means $Q^T\Sigma = \Sigma Q$ or $\Sigma A^T \Sigma = \Sigma A\Sigma.$ If we multiply this with $\Sigma^{-1}$ from both sides, we get $A=A^T,$ so $A$ is symmetric.

$Q$ is orthogonal, which means $Q^TQ=I$ or $$ \Sigma A^2 \Sigma = I $$ or $$ A^2 = \Sigma^{-2} $$ So we are looking for a square root of $\Sigma^{-2}$ and the problem boils down to the question if $\Sigma^{-1}$ is the only valid choice.

We must consider the case that $\Sigma$ has eigenvalues with multiplicity of more than $1.$

Let $\sigma_{r_i} = \sigma_{r_i+1} = \ldots = \sigma_{r_{i+1}-1}$ for $i=1,\ldots,m$ and $r_1=1,$ $r_2=2$ and $r_{m+1}=n+1.$ Furthermore, $\sigma_{r_i}<\sigma_{r_{i+1}}$ for $i=1,\ldots,m-1.$ Then each square root of $\Sigma^{-2}$ can be written as follows $$ A = \begin{pmatrix} \sigma_{r_1}^{-1} B_1 & & & & 0 \\ & \sigma_{r_2}^{-1} B_2 & & & \\ & & \sigma_{r_3}^{-1} B_3 & & \\ & & & \ddots & \\ 0 & & & & \sigma_{r_m}^{-1} B_m \end{pmatrix} \;\;,\;\; B_i^2 = I\;\;\mbox{for}\;\; i=1,\ldots,m $$ where $B_i$ are blocks of size $(r_{i+1}-r_i)\times (r_{i+1}-r_i).$ (The proof is given below)

Then $$ Q = \begin{pmatrix} B_1 & & & & 0 \\ & B_2 & & & \\ & & B_3 & & \\ & & & \ddots & \\ 0 & & & & B_m \end{pmatrix} $$ The $B_i$ are symmetric. $B_i^T$ is the inverse of $B_i$ because of the orthogonality of $Q$, and $B_i$ is also the inverse of $B_i$, because of the property $B_i^2=I.$ Therefore $B_i^T=B_i$ and $$ Q^T\Sigma = \begin{pmatrix} \sigma_{r_1}B_1 & & & & 0 \\ & \sigma_{r_2}B_2 & & & \\ & & \sigma_{r_3}B_3 & & \\ & & & \ddots & \\ 0 & & & & \sigma_{r_m}B_m \end{pmatrix} $$ We want $Q^T\Sigma$ to have the same eigenvalues as $\Sigma,$ which in turn means that $\sigma_{r_i}B_i$ has $\sigma_{r_i}$ as its only eigenvalue. A symmetric matrix with only one eigenvalue must be a scalar multiple of the identity matrix. Therefore, $B_i = I$ for $i,\ldots,m,$ which completes the proof.

Proof sketch for $\sigma_1=0$

If $\sigma_1=0,$ it can easily be shown that $Q_{11}\in\{-1,1\}$ and $Q_{1j}=Q_{j1}=0$ for $j=2,\ldots,n.$ This can be concluded from the symmetry of $Q^T\Sigma$ and from the orthogonality of $Q.$

This means that we can follow the argument from the first part of the proof, but consider only the subspace that is orthogonal to $e_1.$ Basically, this means that we ignore the first row and first column of all $n\times n$ matrices. In the end, we have to decide if $Q_{11}=1$ or $Q_{11}=-1.$ As $Q\in \mathrm{SO}(n)$ and $B_i=I$ for $i=2,\ldots,m,$ we can conclude $Q_{11}=1.$

Diagonalizable square roots of diagonal matrices

Let $A$ be diagonalizable and $A^2$ diagonal. Without loss of generality, the diagonal elements of $A^2$ are sorted in ascending order. Let $0\leq\lambda_1 < \lambda_2 < \ldots < \lambda_m$ such that the eigenvalues of $A$ form a (not necessarily strict) subset of $\{\lambda_1,\;-\lambda_1,\;\lambda_2,\;-\lambda_2,\;\ldots,\;\lambda_m,\;-\lambda_m\}.$ Let $t_i^{+}$ be the algebraic and geometric multiplicity of $\lambda_i$ and $t_i^{-}$ the algebraic and geometric multiplicity of $-\lambda_i$ within the matrix $A$ (we set $t_1^{-}=0$ if $\lambda_1=0.$) Let $r_1=1$ and $r_{i+1} = r_i + t_i^{+}+ t_i^{-}.$

If $Av = \lambda v$ and $Aw = -\lambda w,$ then $A^2 (v+w) = A^2 v + A^2 w =\lambda^2 v + (-\lambda)^2 w = \lambda^2 (v+w).$ This means that the eigenspace of $A^2$ with respect to the eigenvalue $\lambda^2$ is the direct sum of the eigenspaces of $A$ with respect to the eigenvalues $\lambda$ and $-\lambda.$

As $A$ is diagonalizable, the direct sum of the eigenspaces $E_{A,\lambda_1},$ $E_{A,-\lambda_1}$, $E_{A,\lambda_2},$ $E_{A,-\lambda_2},\ldots$, $E_{A,\lambda_m},$ $E_{A,-\lambda_m}$, forms the complete vector space $\mathbb{R}^n.$ This means that each of the eigenspaces of $A^2$ can be written as $E_{A,\lambda_i} \oplus E_{A,-\lambda_i}.$ In a manner of speaking, there is no room for other eigenspaces than those.

We know the eigenspaces of $A^2,$ because $A^2$ is diagonal. We have \begin{eqnarray*} E_{A^2,\lambda_1^2} & = & E_{A,\lambda_1} \oplus E_{A,-\lambda_1} = \mathrm{span}\{e_{r_1},\ldots,e_{r_2-1}\} \\ & \vdots & \\ E_{A^2,\lambda_m^2} & = & E_{A,\lambda_m} \oplus E_{A,-\lambda_m} = \mathrm{span}\{e_{r_m},\ldots,e_{r_{m+1}-1}\} \end{eqnarray*} with the standard basis $e_1,\ldots,e_n.$ Now it is clear that $A$ can be diagonalized by means of a block matrix, because each $E_{A,\lambda_i} \oplus E_{A,-\lambda_i}$ is spanned by the related elements of the standard basis. $$ A= \begin{pmatrix} S_1 & & 0 \\ & \ddots & \\ 0 & & S_m \end{pmatrix} \begin{pmatrix} \lambda_1 I_{t_1^{+}} & & & & 0 \\ & -\lambda_1 I_{t_1^{-}} & & & \\ & & \ddots & & \\ & & & \lambda_m I_{t_m^{+}} & \\ 0 & & & & -\lambda_m I_{t_m^{-}} \end{pmatrix} \begin{pmatrix} S_1 & & 0 \\ & \ddots & \\ 0 & & S_m \end{pmatrix} ^{-1} $$ From this, by simply processing the matrix multiplication, we can conclude that $A$ itself is also a block matrix of the same sort, i.e. $$ A = \begin{pmatrix} A_1 & & 0 \\ & \ddots & \\ 0 & & A_m \end{pmatrix} $$ with $$ A_i = S_i\,\begin{pmatrix} \lambda_i I_{t_i^{+}} & \\ & -\lambda_i I_{t_i^{-}} \\ \end{pmatrix} \, S_i^{-1} $$ Now we only have to show that $A_i = \lambda_i B_i$ with $B_i^2=I.$

Let $T_i=S_i^{-1}$.

Let $S_i^{+}$ be the $(t_i^{+}+t_i^{-})\times t_i^{+}$ matrix that is formed by the first $t_i^{+}$ columns of $S_i$ and $S_i^{-}$ the $(t_i^{+}+t_i^{-})\times t_i^{-}$ matrix that is formed by the last $t_i^{-}$ columns of $S_i.$ Let $T_i^{+}$ be the $t_i^{+}\times (t_i^{+}+t_i^{-})$ matrix that is formed by the first $t_i^{+}$ rows of $T_i$ and $T_i^{-}$ the $t_i^{-}\times (t_i^{+}+t_i^{-})$ matrix that is formed by the last $t_i^{-}$ rows of $T_i.$

Then $T_i^{+}S_i^{+}=I,\;\;T_i^{-}S_i^{-}=I,\;\;T_i^{+}S_i^{-}=0,\;\;T_i^{-}S_i^{+}=0$. $$ A_i = S_i^{+}\lambda_i T_i^{+} + S_i^{-}(-\lambda_i) T_i^{-} = \lambda_i \left( S_i^{+}T_i^{+} - S_i^{-}T_i^{-}\right) $$ Let $B_i = S_i^{+}T_i^{+} - S_i^{-}T_i^{-}.$ Then \begin{eqnarray*} B_i^2 & = & \left( S_i^{+}T_i^{+} - S_i^{-}T_i^{-}\right)\left( S_i^{+}T_i^{+} - S_i^{-}T_i^{-}\right) \\ & =& S_i^{+}T_i^{+}S_i^{+}T_i^{+}-S_i^{+}T_i^{+}S_i^{-}T_i^{-}-S_i^{-}T_i^{-}S_i^{+}T_i^{+}+S_i^{-}T_i^{-}S_i^{-}T_i^{-} \\ & =& S_i^{+}\cdot I\cdot T_i^{+}-S_i^{+}\cdot 0 \cdot T_i^{-}-S_i^{-}\cdot 0 \cdot T_i^{+}+S_i^{-}\cdot I\cdot T_i^{-} \\ & =& S_i^{+}T_i^{+}+S_i^{-}T_i^{-} \\ & =& \begin{pmatrix} & & \\ S_i^{+} & & S_i^{-} \\ & & \end{pmatrix} \begin{pmatrix} & T_i^{+} & \\ & & \\ & T_i^{-} & \end{pmatrix} =S_iT_i = I \end{eqnarray*}

Best Answer

Related Solutions

[Math] Singular values: smallest perturbation to make a matrix singular

Linear Algebra – Investigating Conditions on Eigenvalues Under Orthogonal Multiplication

Related Question