Optimization of a complex matrix with an equality constraint

algorithmscomplex-analysismatricesnumerical linear algebraoptimization

Given matrices $A, B \in \mathbb{C}^{(n-s) \times n}$ and matrix $\Sigma \in \mathbb{C}^{n \times n}$, I want to solve the following equality-constrained minimization problem

$$\begin{array}{ll} \underset{X \in \mathbb{C}^{n \times n}}{\text{minimize}} & \| \Sigma – X \cdot X^H \|_F\\ \text{subject to} & A X = B\end{array}$$

where $X^H = X^*$ is the hermitian conjugate (transpose and complex conjugate) and $\| \cdot \|_F$ is the Frobenius norm.

First we note that $\| Y \|_F^2 = \operatorname{tr}(YY^H)$ so minimizing
$$ \operatorname{trace}\left( \Sigma\Sigma^H – \Sigma X X^H – X X^H \Sigma^H + X X^H X X^H \right) $$
is equivalent to minimizing the original expression.

First I try to see if there is an analytic solution (closed expression) to this constrained optimization problem, and the natural thing to do is try Lagrange Multipliers Method:
$$ f(X) = \operatorname{trace}( (\Sigma – XX^H)(\Sigma – XX^H)^H ) + \lambda (A X – B) $$
but then I encountered two problem:

  1. What is $\lambda$? It can't be a scalar because the second term is a matrix and the first is a scalar. Moreover, there is no matrix that gives a scalar by pre- or post-multiply the second term (the constrain). A possible solution is to write each of the $(n-s) \times n$ equations separately and assign to each a different $\lambda_{i,j}$ with $i=1,…,n-s$ and $j=1,…,n$ and sum them (this will give us $(n-s)n$ terms in $f(X)$). Another solution is to replace this terms with $\langle \lambda , AX – B \rangle$ when $\lambda \in \mathbb{C}^{(n-s)\times n}$.
  2. Since there are expressions of the form $X X^H$ which include $z \cdot \bar{z}$ the first term is not differentiable in the complex sense. This hampers Lagrange Multipliers and other gradient based algorithms.

See How to set up Lagrangian optimization with matrix constrains for discussion of question 1.

Am I wrong here? Or these two arguments indeed show that obtaining analytic expression via Largrange Multipliers is not feasible?

Another idea is to try use the (Moore-Penrose) pseudo-inverse of $A$ to write $X = A^+ B$ but this over-determines $X$ which couldn't be the right solution (since if determined uniquely by the constrain then there is no minimization problem). Note that since $B \in \mathbb{C}^{(n-s) \times n}$ there are more variables ($X_{i,j}$) than equations, so there are $n^2 – (n-s) \times n = sn$ degrees of freedom in $X$.

If there is no analytic solution, what is the algorithmic way to solve this constrained minimization problem? I want to be able to program it and check it with Python using packages such as NumPy and SciPy.
(Remark: algorithms that use real gradients probably won't work here because of the term $XX^H$ which is not differentiable in the complex sense.) Numerical optimization algorithms will be fine too.

Related questions:

General resources on complex optimization:

Best Answer

I am not a big fan of Lagrangian relaxation. I have never found it generic enough. I would rather go with a linear fusion approach.

Let's introduce more matrices to be able to express a more powerful problem.

$X_h$ approximate $X^H$

$X_i$ approximate $X^{-1}$ if exists, or some well behaved $X^{\dagger}$ pseudoinverse if it doesn't.

Now we can write $$\|X_i\Sigma - X_h\|_F$$

And regularization terms $\alpha\|AX-B\|$, $\beta\|X_i X - I\|$, $\gamma\|T X_h - X\|$

Where T does conjugate transpose on vectorisation.

Now there remains one un-linearity here : The $\beta$ term.

So a linear least squares will not be enough.

But we can do an iterated two-stage linear-least squares.

  1. First phase optimizes $X,X_h$,
  2. Second phase optimizes $X_i$

Using Kronecker products we can express "T" operation above with a matrix on the vectorization of $X_h$.

The linear fusion comes into play when we connect $X$ and $X_h$ through $\gamma$-regularization and $X$ and $X_i$ through $\beta$-regularization