Matrix least square optimization with constraint

constraintslagrange multiplieroptimization

I want to Minimize the following equation (the error matrix) by
formulating a cost function and calculating the point where
its gradient equals zero.
\begin{equation}
\hat{X} = \arg \min_{X} \frac{1}{2} {\left\| AX – B \right\|}_{F}^{2}
\end{equation}

where $$X= {\rm Diag}(x)$$
I did this optimization problem without constraint:
$$\eqalign{
E &= AX-B\\
\phi &= \tfrac{1}{2}\|E\|^2 = \tfrac{1}{2}E:E \\
d\phi &= E:dE = E:A\,dX = A^TE:dX \\
&= A^TE:{\rm Diag}(dx) \\
&= {\rm diag}\Big(A^T\big(AX-B\big)\Big):dx \\
\frac{\partial\phi}{\partial x}
&= {\rm diag}(A^TAX) – {\rm diag}(A^TB) \\
}$$

Set the gradient to zero and solve for the optimal $x$.
$$\eqalign{
{\rm diag}(A^TB) &= {\rm diag}(A^TAX) \\
&= \left(I\odot A^TA\right)x \\
x &= (I\odot A^TA)^{-1}\operatorname{diag}\left(A^TB\right) \\
}$$

now, I want to minimize above optimization problem with constraint:\begin{equation}
\hat{X} = \arg \min_{X} \frac{1}{2} {\left\| AX – B \right\|}_{F}^{2} , X= {\rm Diag}(x)
\end{equation}

\begin{align}
\text{subject to } & x_{i}^{min} \leq x_{i} \leq x_{i}^{max} \\
\end{align}

I recently learnt about Lagrange multipliers but I'm not sure how to apply that to my problem. I'll appreciate if any one can help me how to apply that.

Best Answer

Let $a_i$ and $b_i$ be the $i$th columns of $A$ and $B$, respectively. Let $\tilde A$ be the block diagonal matrix whose $i$th diagonal block is $a_i$, and let $b$ be the column vector obtained by vertically concatenating the vectors $b_i$. Notice that $$\| AX - B \|_F^2 = \| \tilde A x - b \|_2^2. $$ So your optimization problem is \begin{align} \text{minimize} & \quad \frac12 \| \tilde A x - b \|_2^2 \\ \text{subject to} & \quad c \leq x \leq d \end{align} where $c$ is the vector whose $i$th component is $x_i^\min$ and $d$ is the vector whose $i$th component is $x_i^\max$. The optimization variable is $x$. Projecting onto the constraint set is easy, so you could solve this problem using the projected gradient method or an accelerated projected gradient method. (Other methods such as interior point methods are also possible.)

When implementing the projected gradient method, it will help to know that the gradient of your objective function $$ f(x) = \frac12 \| \tilde A x - b \|_2^2 $$ is $$ \nabla f(x) = \tilde A^T(\tilde A x - b). $$

Related Question