Solve for $X$: $\text{min}_{X} \frac{1}{2}\left\|Y – X \right\|_F^2 \ \text{s.t.} \ A_i \ X \ e_i = Be_i \ i=1,\ldots,n; \ C X^T = 0$

convex optimizationleast squareslinear algebraoptimizationquadratic programming

The problem in hand is

\begin{align}
\text{minimize}_{X \in \mathbb{R}^{m \times n}} \quad & \frac{1}{2}\left\|Y – X \right\|_F^2\\
\text{subject to }\quad & A_i \ X \ e_i = Be_i \quad i=1,\ldots,n \\
& C X^T = 0 \Longleftrightarrow XC^T = 0\ ,
\end{align}

where $e_i \in \mathbb{R}^n$ is a standard $i$th basis vector for $\mathbb{R}^n$, $Y \in \mathbb{R}^{m \times n}$, $B \in \mathbb{R}^{k \times n}$, $A_i \in \mathbb{R}^{k \times m}$, and $C \in \mathbb{R}^{p \times n}$.

Forming a Lagrangian with Lagrange multipliers, $u_i \in \mathbb{R}^{k} \ \forall i$ and $V \in \mathbb{R}^{m \times p}$:
\begin{align}
L(X,\{u_i\},V) := \frac{1}{2}\left\|Y – X \right\|_F^2 + \sum_{i=1}^n u_i^T \left( A_i X e_i – B e_i \right) + {\rm Tr} \left( V^T \left( X C^T \right) \right) \ .
\end{align}

Question: Closed-form solution of $X$?


I fail to compute that in matrix form. 🙁

My partial attempt (hopefully not making a mistake):
\begin{align}
\frac{\partial L}{\partial X} = 0 \Rightarrow -\left( Y – X\right) + \sum_i A_i^T u_i e_i^T + VC = 0 \\
\Longleftrightarrow X = Y – \sum_i A_i^T u_i e_i^T – VC \ ,
\end{align}

\begin{align}
\frac{\partial L}{\partial u_i} = 0 \Rightarrow & A_i X e_i – B e_i = 0 \\
& \Longleftrightarrow A_i \left( Y – \sum_j A_j^T u_j e_j^T – VC \right) e_i – B e_i = 0 \\
& \Longleftrightarrow A_i \left( Ye_i – \sum_j A_j^T u_j \underbrace{e_j^Te_i}_{=1 \ {\rm if} \ i=j, \ =0 \ {\rm otherwise} } – VC e_i\right) – B e_i = 0,\\
& \Longleftrightarrow A_i Ye_i – A_iA_i^T u_i – A_iVC e_i – B e_i = 0,
\end{align}

and
\begin{align}
\frac{\partial L}{\partial V} = 0 \Rightarrow & XC^T = 0 \\
& \Longleftrightarrow \left( Y – \sum_i A_i^T u_i e_i^T – VC \right) C^T = 0
\end{align}

Now, I fail to extract the Lagrange multipliers :(… Your help will be highly appreciated.


However, I can solve the above problem by vectorizing, i.e., stacking the column vectors of the matrix, such that $y := {\rm vec}(Y) \in \mathbb{R}^{mn}$, $x := {\rm vec}(X) \in \mathbb{R}^{mn}$, and form a block diagonal matrix $\mathbb{A} := {\rm blkdiag}\left(\{A_1,\ldots,A_n\} \right) \in \mathbb{R}^{kn \times mn}$, $b := {\rm vec}(B) \in \mathbb{R}^{kn}$. Also, define $\mathbb{C} := C \otimes I_m \in \mathbb{R}^{pm \times mn}$.

The optimization problem in vectorization form can be rewritten (hopefully not making a mistake) as
\begin{align}
\text{minimize}_{x \in \mathbb{R}^{mn}} \quad & \frac{1}{2}\left\|y – x \right\|_2^2\\
\text{subject to }\quad & \mathbb{A} x = b \\
& \mathbb{C} x = 0 \ .
\end{align}

Forming a Lagrangian (with Lagrange multipliers, $u \in \mathbb{R}^{kn}$ and $v \in \mathbb{R}^{pm}$)
\begin{align}
L(x,u,v) := \frac{1}{2}\left\|y – x \right\|_2^2 + u^T \left( \mathbb{A} x – b \right) + v^T \left( \mathbb{C} x \right) \ .
\end{align}

Then I can obtain the solution in closed form (please see below). But not in the matrix form :S.

\begin{align}
\frac{\partial L}{\partial x} = 0 \Rightarrow x = y – \mathbb{A}^T u – \mathbb{C}^T v \ ,
\end{align}

\begin{align}
\frac{\partial L}{\partial u} = 0 \Rightarrow & \mathbb{A}x – b = 0 \\ & \Longleftrightarrow \mathbb{A} \left( y – \mathbb{A}^T u – \mathbb{C}^T v \right) – b = 0 \\
& \Longleftrightarrow u = \left(\mathbb{A}\mathbb{A}^T \right)^{-1}\left( \mathbb{A} y – b – \mathbb{A} \mathbb{C}^T v \right),
\end{align}

and

\begin{align}
\frac{\partial L}{\partial v} = 0 \Rightarrow & \mathbb{C}x = 0 \\ & \Longleftrightarrow \mathbb{C} \left( y – \mathbb{A}^T u – \mathbb{C}^T v \right) = 0 \\
& \Longleftrightarrow v = \left(\mathbb{C}\mathbb{C}^T \right)^{-1}\left( \mathbb{C} y – \mathbb{C} \mathbb{A}^T u \right).
\end{align}

Assuming $\left(\mathbb{A}\mathbb{A}^T \right)^{-1}$ and $\left(\mathbb{C}\mathbb{C}^T \right)^{-1}$ are invertible. Then, one can see that after some algebra $u$ and $v$ can be pulled out. Thus, the closed-form is doable but messy.


Best Answer

Let $x=\operatorname{vec}(X)$ and $y=\operatorname{vec}(Y)$. The constraints $A_iXe_i=Be_i$ and $CX^T=0$ can be written as $(e_i^T\otimes A_i)x=\operatorname{vec}(Be_i)$ and $(C\otimes I_m)x=0$ and they can be combined into a single linear equation $$ \underbrace{\pmatrix{ e_1^T\otimes A_1\\ \vdots\\ e_n^T\otimes A_n\\ C\otimes I_m}}_A\ x =\underbrace{\pmatrix{\operatorname{vec}(B)\\ 0_{mp\times1}}}_b\in\mathbb R^{nk+mp} $$ (where $A$ is an $(nk+mp)\times mn$ matrix).

Therefore, you are essentially minimising $\frac12\|x-y\|_2^2$ subject to $Ax=b$. A feasible solution exists if and only if $Ax=b$ is solvable. Suppose this is the case. Then $x\in A^+b+\ker(A)$. Since $A^+b\perp\ker(A)$, the minimiser $x$ of $\frac12\|x-y\|_2^2$ subject to $Ax=b$ is given by the sum of $A^+b$ and the orthogonal projection of $y$ onto $\ker(A)$, i.e. $$ x=A^+b+(I_{mn}-A^+A)y. $$