On the Uniqueness the (Vector)-Constrained Least-Squares Problem

least squareslinear algebraoptimization

We got the following problem in class:

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ have independent rows and $d\in\text{im}\left(C\right) \subset \mathbb R^{p}$. Consider the minimization problem $$\min_{x\in \mathbb R^{n}}\frac{1}{2}\left\vert\left\vert Ax – b \right\vert\right\vert_{2}^{2} \quad \text{s.t.} \quad Cx = d. \qquad\qquad \left(\star\right)$$ Prove that a vector $\hat{x}\in\mathbb R^{n}$ solves $\left(\star\right)$ if and only if there exists $z\in\mathbb R^{p}$ s.t. $$\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}b \\ d \end{pmatrix}.$$ Moreover, prove that the solution is uniquely defined if $\begin{pmatrix} A\\ C \end{pmatrix}$ has linearly independent columns.

In this post, I would like to ask about the uniqueness part. I first tried to do it by assuming there exist two solutions to $\left(\star\right)$ and then trying to show that $\hat{x} = \tilde{x}$. Well, maybe it's possible, but after half a page, I didn't know how to continue anymore.. I found two helpful links, one from MSE and another from some online lecture notes (particularly page 1), and this is how I would do it now:

(i)

In our case, the gradient of the Lagrangian $\mathcal L$ is given by
$$\nabla_{x}\mathcal L = A^{T}Ax – A^{T}b – C^{T}\lambda \overset{!}{=} 0\Leftrightarrow A^{T}Ax = A^{T}b + C^{T}\lambda$$ Now, I was wondering whether I could just write $x = \left( A^{T}A\right)^{-1}\left(A^{T}b + C^{T}\lambda\right)$? I mean, for this to be correct, we need that $A^{T}A$ has full rank, i.e. $\text{rank}\left( A^{T}A\right) = n$ must hold. But does this already follow from the fact that $A$ has independent rows (a preprequisite) and $\begin{pmatrix} A \\ C \end{pmatrix}$ has linearly independent columns? To me, it does not quite add up yet..

I know that I haven't used anywhere that the least-squared problem is a (strictly) convex problem, maybe that's part of the solution?

Edit: Let $$By := \begin{pmatrix}A^{T}A & C^{T} \\ C & 0 \end{pmatrix}y = 0,$$ then $$\begin{pmatrix} A^{T}Ay_1 + C^{T}y_2 \\ Cy_{1} \end{pmatrix} = \begin{pmatrix}0 \\ 0\end{pmatrix}.$$ Now, from the last component, why does it necessarily follow that $y_2 = 0$? For example, couldn't it be that $p = 1 = n$, s.t. $C\in R^{1\times 1}$, and then more specifically $C = 0$? Or does this have to hold for arbitrary $C$?

Now for the first component (under the assumption of $y_2 = 0$): $A^{T}Ay_1 = 0$. I guess as above, we say that this has to hold for arbitrary $A^{T}A$, and thus $y_1 = 0$?

Best Answer

First, there seems to be a typo in the question:

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ have independent rows

It should be

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ has independent rows

which means only $C$ needs to have independent rows.

Second, since you wondered if the convexity was used somewhere: yes it was used in the first part of the questions. The linear system \begin{equation}\label{kkt} \begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}b \\ d \end{pmatrix}\tag{$\star\star$} \end{equation} corresponds to the KKT conditions of the original optimization problem. The solution to this system is also the one to the original problem because it is convex. This is also how to prove the first part. Note that the variable $z$ here plays the role of $-\lambda$, where $\lambda$ is the multiplier in your Lagrangian above.

Finally, for the second part, it suffices to show that if $\begin{pmatrix} A\\ C \end{pmatrix}$ has linearly independent columns (in addition to $C$ having independent rows), then the matrix $\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}$ is invertible. Why is that? Because if this matrix is invertible then \eqref{kkt} yields the unique solution \begin{equation} \begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}^{-1} \begin{pmatrix} A^{T}b \\ d \end{pmatrix}. \end{equation} To prove the invertibility, it suffices to show that if $\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} x \\ z \end{pmatrix} = 0$ then $x=z=0$ (i.e., the columns of that matrix is linearly independent). Indeed, consider \begin{align} A^TAx + C^Tz &= 0\tag{1}\label{1}\\ Cx &= 0\tag{2}\label{2}. \end{align} From \eqref{1} we have $x^T(A^TAx + C^Tz) = 0$, which means $\|Ax\|_2^2 + (Cx)^Tz = 0$, implying $\|Ax\|_2^2 = 0$ because $Cx = 0$ according to \eqref{2}. Thus we have $Ax=Cx=0$, or equivalently, $\begin{pmatrix} A\\ C \end{pmatrix}x = 0$, which implies $x=0$ because the columns of $\begin{pmatrix} A\\ C \end{pmatrix}$ are independent. It remains to prove $z=0$, which is true because \eqref{1} yields $C^Tz=0$ while the columns of $C^T$ are independent. We are done.

Related Question