Contradict the uniqueness of least-squares solution

linear algebra

Find a least-squares solution of $Ax=b$ for
$A=\begin{pmatrix}
1 & 1 & 0 & 0 \\
1 & 1 & 0 & 0 \\
1 & 0 & 1 & 0 \\
1 & 0 & 1 & 0 \\
1 & 0 & 0 & 1 \\
1 & 0 & 0 & 1 \\
\end{pmatrix}
,b=\begin{pmatrix}
-3\\
-1\\
0\\
2\\
5\\
1
\end{pmatrix}$

My approach:
$$A^{T}A=\begin{pmatrix}
6 & 2 & 2 & 2 \\
2 & 2 & 0 & 0 \\
2 & 0 & 2 & 0 \\
2 & 0 & 0 & 2
\end{pmatrix}$$

$$A^{T}b=\begin{pmatrix}
4\\
-4\\
2\\
6
\end{pmatrix}$$

Seems $A^{T}A$ is not invertible. The aumented matrix for $A^{T}A\hat{x}=A^{T}b$ is $$\begin{pmatrix}
6 & 2 & 2 & 2 & 4\\
2 & 2 & 0 & 0 & -4\\
2 & 0 & 2 & 0 & 2\\
2 & 0 & 0 & 2 &6
\end{pmatrix}\sim\begin{pmatrix}
1 & 0 & 0 & 1 & 3\\
0 & 1 & 0 & -1 & -5\\
0 & 0 & 1 & -1 & -2\\
0 & 0 & 0 & 0 & 0
\end{pmatrix}$$
Hence the general least-squares solution of $Ax=b$ has the form
$$\hat{x}=\begin{pmatrix}3\\-5\\-2\\0\end{pmatrix}+t\begin{pmatrix}-1\\1\\1\\1\end{pmatrix}$$
But the orthogonal projection $\hat{x}$ is always unique then why that free variable come$?$ Eventually I can't interpret my outcome$!$ Can anyone explain this outcome Geometrically as well as why it contradict uniqueness please.


Update: So the columns of $A$ are not linearly independent hence free variable arrive but then $\color{red}{why}$ the projection of $b$ on the column space are so many$?$

Best Answer

Convexity of the Euclidean distance function guarantees that the orthogonal projection onto the column space of $A$ is unique. In fact, all of the solutions to $A^TA\hat{\mathbf x}=A^T\mathbf b$ represent the same vector: since $A(-1,1,1,1)^T=0$ (verify this!), $A\hat{\mathbf x}=(-2,-2,1,1,3,3)^T$ for all values of $t$. However, its expression as a linear combination of the columns of $A$, which is what the vector $\hat{\mathbf x}$ represents, might not be unique, as your example shows.

It’s instructive to expand the product $M\mathbf v$ of a matrix and vector by columns. Denoting the $j$th column of $M$ by $\mathbf m_j$, we have $$M\mathbf v = \sum_j v_j\mathbf m_j.$$ That is to say the product of a matrix and vector is a linear combination of the columns of the matrix with coefficients provided by the corresponding entries of the vector. From this point of view, we can interpret solving the equation $M\mathbf x=\mathbf b$ as looking for linear combinations of the columns of $M$ that equal $\mathbf b$. For there to be a solution at all, $\mathbf b$ must obviously lie in their span—the column space of $M$.

If the columns of $M$ are linearly independent, they form a basis of $\operatorname{col}(M)$. This is ideal because this linear independence ensures that the expression of any $\mathbf b\in\operatorname{col}(M)$ as a linear combination of the $\mathbf m_j$ is unique. The unique solution to $M\mathbf x=\mathbf b$ (if it exists) is essentially the coordinates of $\mathbf b$ relative to this basis. When the $\mathbf m_j$ are linearly dependent, on the other hand, you no longer have uniqueness of this expression. As a trivial example, just add a column of zeros to $M$: this introduces a free variable which you can set arbitrarily—the last “coordinate” of $\mathbf b$ can be anything at all. As another example, consider $\mathbb R$ as a vector space. With the basis $\{1\}$, every $x\in\mathbb R$ obviously has a unique expression as a linear combination of generating vectors: $x\cdot1$. On the other hand, if the spanning set is $\{1,-1\}$, then for a fixed $x$ we want $a\cdot1-b\cdot1=x$, so we can choose any $a$ and $b$ such that $a=x+b$.

Now let’s look at the formula for the orthogonal projection of $\mathbf b$ onto $\operatorname{col}(A)$, namely $A(A^TA)^{-1}A^T\mathbf b$. This is a generalization of the formula ${\mathbf a^T\mathbf b\over\mathbf a^T\mathbf a}\mathbf a$ for orthogonal projection onto the vector $\mathbf a$: you compute the dot product of $\mathbf b$ with $\mathbf a$, divide by $\lVert\mathbf a\rVert^2$ to normalize and obtain the $\mathbf a$-coordinate of $\mathbf b$, and then multiply by $\mathbf a$ to get the vector in the original basis. The matrix formula can be interpreted in a similar way: $A^T\mathbf b$ computes the dot products of $\mathbf b$ with each column of $A$ “in bulk,” $(A^TA)^{-1}$ normalizes and sorts out the effects of any overlap among the columns of $A$, resulting in the $A$-basis coordinates of the projection, and finally we multiply by $A$ to perform a change of basis back to the original. With this reading, we can see why the process breaks down when $A$ doesn’t have full column rank: there isn’t a set of unique $A$-coordinates for any vector.

This is what’s going on in your example. Since the columns of $A$ aren’t linearly independent, $A^TA$ isn’t invertible, and in particular its columns aren’t linearly independent, either. So, when you try to solve $A^TA\hat{\mathbf x}=A^T\mathbf b$, there isn’t a unique linear combination for $A^T\mathbf b$—there’s an infinite number of solutions to this equation. However, as I pointed out at the top, all of these solutions represent the same vector. They’re just different ways of expressing it as a linear combination of the columns of $A$.