[Math] Solution to least squares problem using Singular Value decomposition

least squareslinear algebra

Let $A_{mn}$, $m\geq n$ have full column rank and $A=U_1 \Sigma V^T$ be its reduces singular value decomposition. Show that the linear least squares problem $min_{x \in R^n} \||y-Ax||_2$ is solved at $x= V \Sigma^{-1} U_1^Ty$.

Here is my attempt, just want to make sure it is correct.

Since A has full column rank there is a unique solution and that solution satisfies the normal equations, ie $A^Ty= A^TA x$.
So if we manage to show that the given x makes the right hand side of the equation equal to the left, then we are done.

$A^Ty= V \Sigma^T U_1^Ty$ and
$A^TAx=V \Sigma^T U_1^T U_1 \Sigma V^T V \Sigma^{-1} U_1^Ty= V \Sigma^T U_1^Ty$ since $V^TV=I$ and $U_1^TU_1=I$.

Hence both sides are equal and the equation is satisfied, so x is the solution.

I am just wondering whether we can use $U_1^T U_1 =I$ and $V^TV=I$ even though its the reduced singular value? Those still hold correct?

Best Answer

Using the singular value decomposition to solve the full column rank linear system $$ \mathbf{A} x = b $$ where $$ \mathbf{A}\in\mathbb{C}^{m\times n}_{n}, \quad x \in\mathcal{C}^{n}, \quad b \in\mathcal{C}^{n} $$ By construction, $m\ge n$ and the matrix rank $\rho=n$.

Singular value decomposition

The singular value decomposition is guaranteed to exist: $$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*} = \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % \left[ \begin{array}{c} \mathbf{S} \\ \mathbf{0} \end{array} \right] % \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right]. $$ Because the matrix has full column rank the null space $\color{red}{\mathcal{N}\left( \mathbf{A}\right)}$ is trivial.

The codomain matrix $\mathbf{U}\in\mathbb{C}^{m\times m}$, and the domain matrix $\mathbf{V}\in\mathbb{V}^{n\times n}$ are unitary: $$ \mathbf{U}^{*}\mathbf{U} = \mathbf{U}\,\mathbf{U}^{*} = \mathbf{I}_{m}, \quad \mathbf{V}^{*}\mathbf{V} = \mathbf{V}\,\mathbf{V}^{*} = \mathbf{I}_{n}. $$

The column vectors of $\color{blue}{\mathbf{U}_{\mathcal{R}}}\in\mathbb{C}^{m\times\rho}$ are an orthogonal span of $\color{blue}{\mathcal{R}}(\mathbf{A})$; the column vectors of $\color{blue}{\mathbf{V}_{\mathcal{R}}}\in\mathbb{C}^{n\times\rho}$ are an orthogonal span of $\color{blue}{\mathcal{R}}(\mathbf{A}^{*})$. Similarly, the column vectors of $\color{red}{\mathbf{U}_{\mathcal{N}}}\in\mathbb{C}^{m\times m-\rho}$ are an orthogonal span of $\color{red}{\mathcal{N}(\mathbf{A^{*}})}$. The $\rho$ singular values are ordered and real: $$ \sigma_{1} \ge \sigma_{2} \ge \dots \ge \sigma_{\rho}>0. $$ These singular values form the diagonal matrix of singular values $$ \mathbf{S} = \text{diagonal} (\sigma_{1},\sigma_{1},\dots,\sigma_{\rho}) \in\mathbb{R}^{\rho\times\rho}. $$ The $\mathbf{S}$ matrix is embedded in the sabot matrix $\Sigma\in\mathbb{R}^{m\times n}$ whose shape insures conformality.

Least squares solution

Consider the case where the data vector is not the null space $b\notin\color{red}{\mathcal{N}\left( \mathbf{A}^{*} \right)}$, and the linear system $$ \mathbf{A} x - b = \mathbf{0} $$ does not have a solution of exactly 0. Generalize the question and ask for the best solution vector $x$ which minimizes the difference $r^{2}(x) = \lVert \mathbf{A} x - b \rVert_{2}^{2}$ in the $2-$norm. The least squares minimizers are defined as $$ x_{LS} = \left\{ x \in \mathbb{C}^{n} \colon \big\lVert \mathbf{A} x - b \big\rVert_{2}^{2} \text{ is minimized} \right\} $$

Exploit SVD: separate range and null spaces

Unitary transformations are invariant under the two norm. For example $$ \lVert \mathbf{V} x \rVert_{2} = \lVert x \rVert_{2}. $$ This provides a freedom to transform problems into a form easier to manipulate. In this case, $$ \lVert \mathbf{U}^{*} \mathbf{A}x - \mathbf{U}^{*} b \rVert_{2}^{2} = \lVert \Sigma \mathbf{V}^{*} x - \mathbf{U}^{*} b \rVert_{2}^{2}. $$ Switching to the block form which separates range and null spaces, $$ \begin{align} r^{2}(x) = \big\lVert \Sigma\, \mathbf{V}^{*} x - \mathbf{U}^{*} b \big\rVert_{2}^{2} &= \Bigg\lVert % \left[ \begin{array}{c} \mathbf{S} \\ \mathbf{0} \end{array} \right] % \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] % x - \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\[4pt] \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] b \Bigg\rVert_{2}^{2} \\ &= \big\lVert \mathbf{S} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} x - \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} b \big\rVert_{2}^{2} + \big\lVert \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} b \big\rVert_{2}^{2} \end{align} $$

Moore-Penrose pseudoinverse

By varying the solution vector $x$ we can control the range space term and force it to $0$: $$ \mathbf{S}\, \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} x - \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} b = 0 \qquad \Rightarrow \qquad x = \color{blue}{\mathbf{V}_{\mathcal{R}}} \mathbf{S}^{-1} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*}b. $$ This is the point solution, the solution of least norm: $$ x_{LS} = \mathbf{A}^{\dagger} b = \color{blue}{\mathbf{V}_{\mathcal{R}}} \mathbf{S}^{-1} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*}b $$ using the thin SVD.

The error term cannot be removed, leaving a residual error: $$ r^{2}\left(x_{LS}\right) = \big\lVert \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} b \big\rVert_{2}^{2}, $$ which quantifies how much of the data vector $b$ resides in the null space.

Orthogonality

Your question raises a point at times overlooked: the expression of orthogonality for the domain matrices. To be clear, $$ \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}} \end{array} \right] \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] = \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}} \end{array} \right] = \mathbf{I}_{n}, $$ and $$ \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] \left[ \begin{array}{l} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] = \left[ \begin{array}{l} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] = \mathbf{I}_{m} $$ Keep in mind that when restricted to the range space component we get an identity matrix in only one direction: $$ \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \color{blue}{\mathbf{U}_{\mathcal{R}}} = \mathbf{I}_{\rho}, $$ $$ \color{blue}{\mathbf{U}_{\mathcal{R}}} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \ne \mathbf{I}_{m}. $$

Related Question