The (Vector-)Constrained Least-Squares Problem

least squareslinear algebraoptimization

We got the following problem in class:

Let $A\in\mathbb R^{m\times n}$, $b\in\mathbb R^{m}$, $C\in\mathbb R^{p\times n}$ has independent rows and $d\in\text{im}\left(C\right) \subset \mathbb R^{p}$. Consider the minimization problem $$\min_{x\in \mathbb R^{n}}\frac{1}{2}\left\vert\left\vert Ax – b \right\vert\right\vert_{2}^{2} \quad \text{s.t.} \quad Cx = d. \qquad\qquad \left(\star\right)$$ Prove that a vector $\hat{x}\in\mathbb R^{n}$ solves $\left(\star\right)$ if and only if there exists $z\in\mathbb R^{p}$ s.t. $$\begin{pmatrix} A^{T}A & C^{T} \\ C & 0 \end{pmatrix}\begin{pmatrix} \hat{x} \\ z \end{pmatrix} = \begin{pmatrix} A^{T}b \\ d \end{pmatrix}.$$ Moreover, prove that the solution is uniquely defined if $\begin{pmatrix} A\\ C \end{pmatrix}$ has linearly independent columns.

My problem is with $"\Leftarrow"$, I managed $"\Rightarrow"$ with the help of Lagrange multipliers $\lambda$ and by setting $z:=-\lambda$. But for $"\Leftarrow"$, it's not quite clear how to "get rid" of $z$ and $C$, as they do not appear in $\left( \star\right)$. I mean, the matrix equation can be written as: $$A^{T}A\hat{x} + C^{T}z = A^{T}b, \qquad C\hat{x} = d.$$ I was now wondering whether I could simply integrate the two equations with respect to a vector and get a scalar, and even if it's possible, it is a priori not clear to me how to recover the constant $b^{T}b \in \mathbb R$ that we get from the norm-squared..

Best Answer

Since you need help on the "$\Leftarrow$" part, this is how it can be done.

Suppose that there exists $z\in\mathbb R^{p}$ satisfying \begin{align} A^TAx^* + C^Tz &= A^Tb\tag{1}\label{1}\\ Cx^* &= d\tag{2}\label{2}. \end{align}

We need to show that $x^*$ solves $(\star)$, or equivalently \begin{equation} \|Ax-b\|_2^2 \ge \|Ax^*-b\|_2^2 \quad\mbox{for all } x \mbox{ such that } Cx=d. \end{equation} We know the equality will occur when $x=x^*$, so it is natural to decompose the LHS as \begin{align} \|Ax-b\|_2^2 &= \|Ax-Ax^* + Ax^* - b\|_2^2 \\ &= \|A(x-x^*)\|_2^2 + \|Ax^* - b\|_2^2 + 2(x-x^*)^TA^T(Ax^* - b). \end{align} The last term vanishes: \begin{align} (x-x^*)^TA^T(Ax^* - b) &= (x-x^*)^T(A^TAx^* - A^Tb)\\ &=-(x-x^*)^TC^Tz\\ &=-z^T(Cx-Cx^*)\\ &=0. \end{align} The conclusion follows.


By the way, you should view the original linear constraint as $d=Cx$ to have the Lagrangian $L(x,z) = f(x) + z^T(d-Cx)$, which helps avoid the change of variables $z=-\lambda$.