[Math] Is basis vectors orthonormality mandatory for SVD

linear algebramatrix decompositionsvd

I have been trying to understand intuitively the Singular Value Decomposition(SVD) and broadly speaking I tend to think of it as of a generalisation of Eigenvalue Decomposition(EVD), i.e., transform a linear map in a simple scaling provided the given changes of basis are done accordingly but this time between spaces with different dimensions, namely, from $\mathbb{R}^n$ to $\mathbb{R}^m$.

Mathematically speaking, for a $n \times m$ matrix $\mathbf A$ we want to find the $v_i$'s and $u_i$'s such that:

$$A v_i=\sigma_i u_i$$

where:
$$v_i \in \{v_1, v_2, …, v_n\}$$
$$u_i \in \{u_1, u_2, …, u_m\}$$

However, I still do not get it why $\{v_1, v_2, …, v_n\}$ and $\{u_1, u_2, …, u_m\}$ must be orthonormal bases, as is assumed by all the resources I have found, which jump right into finding these orthonormal bases by calculating the eigenvectors of $\mathbf A^\top \mathbf A$.

With that said, are orthonormal bases a mandatory requirement for SVD, if so, what would go wrong if you just pick two linearly independent bases not necessarily orthonormal?

Best Answer

This is a great question. Nothing will go wrong if insist on working with arbitrary (not necessarily orthonormal) bases, it just won't lead to an SVD decomposition but to a different, more trivial but still sometimes useful, decomposition.

Let's assume we start with an $n \times m$ real matrix $A$ and think about it as (defining a) linear map $T_A \colon \mathbb{R}^m \rightarrow \mathbb{R}^n$ given by $T_A(v) = Av$. You can ask two different questions:

  1. How can we pick a basis $\alpha = (v_1, \dots, v_m)$ for $\mathbb{R}^m$ and a basis $\beta = (u_1, \dots, u_n)$ for $\mathbb{R}^n$ so that, with respect to those bases, the map $T_A$ will be represented by the simplest possible matrix? It turns out that the answer to this question is quite simple. If the rank of the matrix $A$ is $k$, you can pick $\alpha$ and $\beta$ so that $Av_i = w_i$ for $1 \leq i \leq k$ and $Av_i = 0$ for $k < i \leq m$. With respect to those bases, the map $T_A$ will be represented by a rectangular diagonal matrix with $k$ ones on the diagonal and all other entries zeros. In this case, there are no "singular values" - only ones and zeroes.

    How do you do it in practice? You can always perform a finite number of row and column operations on $A$ until it becomes a rectangular diagonal matrix with $k$ ones on the diagonal and all other entries are zero. Since performing row/column operations corresponds to multiplying the matrix $A$ on the left/right by elementary matrices, this gives you a decomposition of the form $P^{-1}AQ = D$ where $P,Q$ are invertible matrices and $D$ is the "diagonal" matrix. Multiplying by $P$, you get the equation $AQ = PD$. The columns of $Q$ will form the basis $\alpha$ while the columns of $P$ will form the basis $\beta$. Multiplying by $Q^{-1}$, you get the decomposition $A = PDQ^{-1}$ which is similar to the SVD decomposition, only here the matrices $P$ and $Q$ are not necessary orthogonal because we didn't insist on orthonormal bases and the entries on the diagonal of $D$ are only $1$'s and $0$'s.

    Finally, it is instructive to think about the case where $n = m$ and the matrix $A$ has full rank. Then, we can pick any basis $\alpha = (v_1, \dots, v_n)$ for $\mathbb{R}^n$ and take the basis $\beta = (w_1, \dots, w_n)$ to be $w_i := Av_i$ and then trivially $Av_i = w_i$. If we take the basis $\alpha$ to be the standard basis, we get the "trivial" decomposition $A = A \cdot I \cdot A^{-1}$.

  2. How can we pick an orthonormal basis $\alpha = (v_1, \dots, v_m)$ for $\mathbb{R}^m$ and an orthonormal basis $\beta = (w_1, \dots, w_n)$ for $\mathbb{R}^n$ so that, with respect to those bases, the map $T_A$ will be represented by the simplest possible matrix? This is a different question which is answered by the SVD decomposition. It turns out that you can always find bases $\alpha,\beta$ so that $Av_i = \sigma_i w_i$ and if the rank of the matrix is $k$ you can arrange that $\sigma_i = 0$ for $k < i \leq m$ and that $\sigma_i > 0$ for $1 \leq i \leq k$ but now you can't make all the $\sigma_i$'s to be ones! Those will be the singular values of $A$ and they provide interesting geometric information about the linear map $T_A$. For example, the map $T_A$ will map an $m$-dimensional cube of volume one to a $k$-dimensional parallelotope of ($k$-dimensional) volume $\sigma_1 \dots \sigma_k$. If you stack the vectors $v_i$ as columns of a matrix $Q$ and the vectors $w_i$ as columns of a matrix $P$, you get the decomposition $A= P \Sigma Q^{-1} = P \Sigma Q^T$ where now, $P$ and $Q$ are orthogonal matrices and not just invertible matrices and $\Sigma$ is a rectangular diagonal matrix with non-zero entries $\sigma_1, \dots, \sigma_k$ on the diagonal.

    Finally, let us think about the case where $n = m$ and the matrix $A$ has full rank. We can try and take $\alpha = (v_1,\dots,v_n)$ to be the standard basis $v_i = e_i$ which is orthonormal. Then $(Ae_1, \dots, Ae_n)$ will be a basis, but it won't necessarily be an orthogonal one so we can't just take $w_i = Av_i$.

Related Question