Solution for least squares problem

least squareslinear algebramatrix-rank

I was going over the linear algebra regarding how to solve the least squares problem and had a few questions regarding the solution.

When solving the $||Ax-b||^2$ I understand the derivation of expanding it, taking the derivative with respect to x and setting it to $0$ to reach this:

$(A^TA)x = A^Tb$

Now let us assume $A$ is an $nxm$ matrix with $n>m$

I understand that if A has rank $m$ then the least squares solution will be unique as $A^TA$ will be invertible. However what if A has rank $q$ where $q<m$ , then does the least squares solution can't be unique so how does it go about solving the problem?

Will there be an infinite number of solutions and can there potentially be no solution? I was having a tough time trying to figure out these last two questions and was hoping to get some intuition on them.

Lastly, how would software go about solving the least squares problem when A has rank $q$ where $q<m$

Best Answer

I think it could be instructive to see what happens in a concrete case. Consider, for instance, $n=3, m=2$. Then if $q=m$, the image of $A$ is a plane, and solving $A^TAx=A^Tb$ finds the point on the plane which is closest to $b$, and the $x$ that is mapped to that point by $A$.

If $q=1$, then the image of $A$ is a line. Solving $A^TAx=A^Tb$ finds the point on that line which is closest to $b$, and the $x$ that are mapped to that point by $A$.

There will always be at least one solution. $A^Tb$ is in the image of $A^T$, and since $A^T$ and $A^TA$ have the same rank, they must have the same image, giving us at least one solution.

Related Solutions

[Math] The SVD Solution to Linear Least Squares / Linear System of Equations

For the full rank least squares problem, where $A \in \mathbb{K}^{m \times n},m>n=\mathrm{rank}(A)$ ($\mathbb{K}$ is the base field), the solution is $(A^T A)^{-1} A^T b$. This is a very bad way to approach the problem numerically for condition number reasons: you roughly square the condition number, so a relatively tractable problem with $\kappa=10^8$ becomes a hopelessly intractable problem with $\kappa=10^{16}$ (where we think about tractability in double precision floating point). The condition number also enters into convergence rates for certain iterative methods, so such methods often perform poorly for the normal equations.

The SVD pseudoinverse is exactly the same as the normal equations pseudoinverse i.e. $(A^T A)^{-1} A^T$. You simply compute it using the SVD and simplify. There is indeed a simplification; the end result is

$$(A^T A)^{-1} A^T = V (\Sigma^T \Sigma)^{-1} \Sigma^T V^T.$$

This means that if I know the matrix of right singular vectors $V$, then I can transform the problem of finding the pseudoinverse of $A$ to the (trivial) problem of finding the pseudoinverse of $\Sigma$.

The above is for the full rank problem. For the rank deficient problem with $m>n>\mathrm{rank}(A)$, the LS solution is not unique; in particular, $A^T A$ is not invertible. The usual choice is to choose the solution of minimal Euclidean norm (I don't really know exactly why people do this, but you do need some criterion). It turns out that the SVD pseudoinverse gives you this minimal norm solution. Note that the SVD pseudoinverse still makes sense here, although it does not take the form I wrote above since $\Sigma^T \Sigma$ is no longer invertible either. But you still obtain it in basically the same way (invert the nonzero singular values, leave the zeros alone).

One nice thing about considering the rank-deficient problem is that even in the full rank case, if $A$ has some singular value "gap", one can forget about the singular values below this gap and obtain a good approximate solution to the full rank least squares problem. The SVD is the ideal method for elucidating this.

The homogeneous problem is sort of unrelated to least squares, it is really an eigenvector problem which should be understood using different methods entirely.

Finally a fourth comment, not directly related to your three questions: in reasonably small problems, there isn't much reason to do the SVD. You still should not use the normal equations, but the QR decomposition will do the job just as well and it will terminate in an amount of time that you can know in advance.

[Math] Solution to least squares problem using Singular Value decomposition

Using the singular value decomposition to solve the full column rank linear system $$ \mathbf{A} x = b $$ where $$ \mathbf{A}\in\mathbb{C}^{m\times n}_{n}, \quad x \in\mathcal{C}^{n}, \quad b \in\mathcal{C}^{n} $$ By construction, $m\ge n$ and the matrix rank $\rho=n$.

Singular value decomposition

The singular value decomposition is guaranteed to exist: $$ \mathbf{A} = \mathbf{U} \, \Sigma \, \mathbf{V}^{*} = \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] % \left[ \begin{array}{c} \mathbf{S} \\ \mathbf{0} \end{array} \right] % \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right]. $$ Because the matrix has full column rank the null space $\color{red}{\mathcal{N}\left( \mathbf{A}\right)}$ is trivial.

The codomain matrix $\mathbf{U}\in\mathbb{C}^{m\times m}$, and the domain matrix $\mathbf{V}\in\mathbb{V}^{n\times n}$ are unitary: $$ \mathbf{U}^{*}\mathbf{U} = \mathbf{U}\,\mathbf{U}^{*} = \mathbf{I}_{m}, \quad \mathbf{V}^{*}\mathbf{V} = \mathbf{V}\,\mathbf{V}^{*} = \mathbf{I}_{n}. $$

The column vectors of $\color{blue}{\mathbf{U}_{\mathcal{R}}}\in\mathbb{C}^{m\times\rho}$ are an orthogonal span of $\color{blue}{\mathcal{R}}(\mathbf{A})$; the column vectors of $\color{blue}{\mathbf{V}_{\mathcal{R}}}\in\mathbb{C}^{n\times\rho}$ are an orthogonal span of $\color{blue}{\mathcal{R}}(\mathbf{A}^{*})$. Similarly, the column vectors of $\color{red}{\mathbf{U}_{\mathcal{N}}}\in\mathbb{C}^{m\times m-\rho}$ are an orthogonal span of $\color{red}{\mathcal{N}(\mathbf{A^{*}})}$. The $\rho$ singular values are ordered and real: $$ \sigma_{1} \ge \sigma_{2} \ge \dots \ge \sigma_{\rho}>0. $$ These singular values form the diagonal matrix of singular values $$ \mathbf{S} = \text{diagonal} (\sigma_{1},\sigma_{1},\dots,\sigma_{\rho}) \in\mathbb{R}^{\rho\times\rho}. $$ The $\mathbf{S}$ matrix is embedded in the sabot matrix $\Sigma\in\mathbb{R}^{m\times n}$ whose shape insures conformality.

Least squares solution

Consider the case where the data vector is not the null space $b\notin\color{red}{\mathcal{N}\left( \mathbf{A}^{*} \right)}$, and the linear system $$ \mathbf{A} x - b = \mathbf{0} $$ does not have a solution of exactly 0. Generalize the question and ask for the best solution vector $x$ which minimizes the difference $r^{2}(x) = \lVert \mathbf{A} x - b \rVert_{2}^{2}$ in the $2-$norm. The least squares minimizers are defined as $$ x_{LS} = \left\{ x \in \mathbb{C}^{n} \colon \big\lVert \mathbf{A} x - b \big\rVert_{2}^{2} \text{ is minimized} \right\} $$

Exploit SVD: separate range and null spaces

Unitary transformations are invariant under the two norm. For example $$ \lVert \mathbf{V} x \rVert_{2} = \lVert x \rVert_{2}. $$ This provides a freedom to transform problems into a form easier to manipulate. In this case, $$ \lVert \mathbf{U}^{*} \mathbf{A}x - \mathbf{U}^{*} b \rVert_{2}^{2} = \lVert \Sigma \mathbf{V}^{*} x - \mathbf{U}^{*} b \rVert_{2}^{2}. $$ Switching to the block form which separates range and null spaces, $$ \begin{align} r^{2}(x) = \big\lVert \Sigma\, \mathbf{V}^{*} x - \mathbf{U}^{*} b \big\rVert_{2}^{2} &= \Bigg\lVert % \left[ \begin{array}{c} \mathbf{S} \\ \mathbf{0} \end{array} \right] % \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] % x - \left[ \begin{array}{c} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\[4pt] \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] b \Bigg\rVert_{2}^{2} \\ &= \big\lVert \mathbf{S} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} x - \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} b \big\rVert_{2}^{2} + \big\lVert \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} b \big\rVert_{2}^{2} \end{align} $$

Moore-Penrose pseudoinverse

By varying the solution vector $x$ we can control the range space term and force it to $0$: $$ \mathbf{S}\, \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} x - \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} b = 0 \qquad \Rightarrow \qquad x = \color{blue}{\mathbf{V}_{\mathcal{R}}} \mathbf{S}^{-1} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*}b. $$ This is the point solution, the solution of least norm: $$ x_{LS} = \mathbf{A}^{\dagger} b = \color{blue}{\mathbf{V}_{\mathcal{R}}} \mathbf{S}^{-1} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*}b $$ using the thin SVD.

The error term cannot be removed, leaving a residual error: $$ r^{2}\left(x_{LS}\right) = \big\lVert \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} b \big\rVert_{2}^{2}, $$ which quantifies how much of the data vector $b$ resides in the null space.

Orthogonality

Your question raises a point at times overlooked: the expression of orthogonality for the domain matrices. To be clear, $$ \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}} \end{array} \right] \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] = \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}}^{*} \end{array} \right] \left[ \begin{array}{c} \color{blue}{\mathbf{V}_{\mathcal{R}}} \end{array} \right] = \mathbf{I}_{n}, $$ and $$ \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] \left[ \begin{array}{l} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] = \left[ \begin{array}{l} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \\ \color{red}{\mathbf{U}_{\mathcal{N}}}^{*} \end{array} \right] \left[ \begin{array}{cc} \color{blue}{\mathbf{U}_{\mathcal{R}}} & \color{red}{\mathbf{U}_{\mathcal{N}}} \end{array} \right] = \mathbf{I}_{m} $$ Keep in mind that when restricted to the range space component we get an identity matrix in only one direction: $$ \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \color{blue}{\mathbf{U}_{\mathcal{R}}} = \mathbf{I}_{\rho}, $$ $$ \color{blue}{\mathbf{U}_{\mathcal{R}}} \color{blue}{\mathbf{U}_{\mathcal{R}}}^{*} \ne \mathbf{I}_{m}. $$

Best Answer

Related Solutions

[Math] The SVD Solution to Linear Least Squares / Linear System of Equations

[Math] Solution to least squares problem using Singular Value decomposition

Related Question