Least Squares using QR for underdetermined system

least squareslinear algebramatricesorthogonal matrices

I am trying to understand the usage of $QR$ decomposition for Least Squares problem of
$$ Ax=b$$
when the system is underdetermined – $A$ is $m\times n$ and m < n, but $A$ is full rank and the system is stable.

Reading the following Wikipedia article on QR factorization, the section "Using for solution to linear inverse problems" isn't clear to me.
My logic for the solution was based on the following:
$$ Ax = (A^T)^Tx = b$$
We can do $QR$ decomposition on $A^T$ to get a square matrix $Q$ which is $n \times n$ and matrix $R$ which is $n \times m$.
Now we have
$$ (QR)^Tx=b $$
It seems to me that in order to to proceed we must take the inverse of $(QR)^T$ -> $(QR)^{-T}$.

Wiki article notes that $R = \begin{bmatrix} R_1 \\ 0\end{bmatrix}$ where $R_1$ is $m \times m$, and later "with some algebra" we can get that

$$\hat{x} = Q\begin{bmatrix} (R_1)^{-T}b \\ 0\end{bmatrix}$$

I do not understand the algebra involved in getting this solution. It seems to be that we are stuck on getting the inverse of $(QR)^T$. I can see how you can get the inverse of $Q$ and bring it on the other side since it is an orthogonal matrix, and inverse of $R_1$ since it is square, but what are the rules of carrying out these operations on such augmented matricies?

Is it accurate to say that
$$ (Q\begin{bmatrix} R_1 \\ 0\end{bmatrix})^{-T} = Q\begin{bmatrix} R_1^{-T} \\ 0\end{bmatrix}$$

This is the only way I can see that this explicit solution is obtained, why does this work? Or am I missing something?

Best Answer

Least Squares solution with a QR-decomposition can be done with over-determined systems, i.e. $m>n$. Here "Least Squares" denotes a property of $e:=Ax-b$ by minimizing $\|e\|_2$.

You have an underdetermined system of equations, meaning that a lot of solutions $x$ fulfill $Ax=b$, so there is no error to minimize. "Least Squares" in these works usually refers to a property of the solution $x$ having $\|x\|_2$ minimal. This is usually obtained by the pseudo-inverse $A^\dagger$ that can (most easily) calculated from the SVD of $A$.

The solution provided in the Wikipedia-article has nothing to do with a least-squares solution. The general idea of solving underdetermined systems is to ignore many columns of $A$ in order to have a square matrix to invert. You can just choose zeros in the respective entries of $x$. But choosing these columns can be hard, as you have $n$ to choose from that all just form an $m$-dimensional space. You can choose very poorly and have vectors that are close to linearly dependent, making your solution numerically unstable. Computing the QR-decomposition gives you an orthogonal (which is nearly always better) basis, giving you the "best" submatrix to choose from.

I can go into more detail, if required.

Related Question