Inverse Hessian Operator with Wirtinger calculus

complex-geometryriemannian-geometry

First, a little bit of context. I'm working on an minimization problem, and I'm looking into Riemannian Manifold optimization to solve it. Being rather new to the topic, and without strong relevant background, I started by using Euclidean space (but I'm looking into reformulating the problem using Grassmannian manifolds later).

The cost function is defined over complex vectors, $f:\mathbb{C}^n\mapsto\mathbb{R}$. Moreover, the problem is phase-invariant, i.e. $f(w)=f(\alpha w)$ with $\alpha\in\mathcal{U}=\{\alpha\in\mathbb{C}^n:|\alpha|=1\}$. So the problem is actually defined on a quotient manifold $\mathcal{M}=\mathbb{C}^n/\mathcal{U}$ (which is one of the motivations to use manifold optimization), although this is not a major issue in my actual question.

Being a complex domain, I use Wirtinger calculus to derive the Euclidean gradients and Hessians. Using this expressions, I obtain very nice numerical results using Riemannian Trust-Regions as implemented in Manopt, and I'm now working on the proof that the method converges globally and locally. I've done most of this work already (using the theorems of the first link above, which is the paper on Riemannian Trust-Regions): I've shown global convergence (using Corollary 4.6), superlinear local convergence (Theorem 4.13, as the Hessian is exact), and most of the conditions for local convergence (Theorem 4.12). The only part missing of the latter is to show that the inverse Riemannian Hessian operator is bounded in a neighborhood of a local minimizer.

My problem: I haven't been able to find an expression for the inverse Riemannian Hessian.

The Wirtinger Hessian can be described as a block matrix where each submatrix depends only on the argument,
$$\nabla^2 f(w)=\begin{bmatrix}\mathbf{A}_w&\mathbf{B}_w\\\overline{\mathbf{B}_w}&\overline{\mathbf{A}_w}\end{bmatrix}$$
with $w\in\mathbb{C}^n$. Note that, due to Wirtinger calculus, a second order approximation of $f$ in the neighborhood of $w_0$ is of the form
$$\frac{1}{2}\begin{bmatrix}\delta\\\overline{\delta}\end{bmatrix}^*\nabla^2 f(w)\begin{bmatrix}\delta\\\overline{\delta}\end{bmatrix} $$

I computed the Riemannian Hessian with Frechet derivatives,
$$\langle u,\mathrm{Hess}_w f[u]\rangle=\frac{d^2}{dt^2}\Bigg|_{t=0} f(w+tu)$$
and using the notation above the expression reduces to
$$\mathrm{Hess}_w f[u] = \mathbf{A}_wu+\mathbf{B}_w\overline{u}$$

My major issue is the conjugate of $u$ in the expression, as whenever I try to solve for $u$ in $h=\mathrm{Hess}_w f[u] $, I can't handle the conjugation. Obviously the operator is not $\mathbb{C}$-linear, so I've rewritten the expression with the usual $\mathbb{C}^n\mapsto\mathbb{R}^{2n}$ reformulation, but had no success isolating the corresponding terms of $u$. It might also be useful to note that
$$ \nabla^2 f(w)\begin{bmatrix}u\\\overline{u}\end{bmatrix} = \begin{bmatrix}\mathrm{Hess}_w f[u]\\\overline{\mathrm{Hess}_w f[u]}\end{bmatrix}$$

Any ideas to find the inverse operator? I have avoided the use of preconditioners as I don't need any in the numerical simulations, but it may be needed for this.

Best Answer

To anyone interested in this question, I have contacted an expert in the field who helped me with my own questions and shortcomings, and it seems to be enough for an answer to my original problem: to prove local convergence of Riemannian Trust-Regions for my optimization problem. The following is a summary of our conversation, and I hope my little journey into Riemannian manifold optimization helps more people interested in the topic.

The missing part to prove local convergence of Riemannian Trust-Regions in my problem, using the cited Theorem in the question above, is to show that the inverse Riemannian Hessian (IRH) operator $\mathcal{H}^{-1}$ is bounded in a neighborhood of a minimizer $x$, i.e., $\|\mathcal{H}_x^{-1}\|\leq M$, for a constant finite $M$.

What I tried to do was to compute the IRH operator, although this is not the only way to achieve this.

The Hessian at the glocal optimizer (or at any other point) is a symmetric operator, hence it has real eigenvalues. If the eigenvalues are nonzero, the eigenvalues of the IRH are the reciprocals of the eigenvalues of the Hessian operator: $$\lambda_1\leq\ldots\leq\lambda_N\text{ eigenvalues of }\mathcal{H} \Rightarrow \frac{1}{\lambda_1}\geq\ldots\geq\frac{1}{\lambda_N}\text{ eigenvalues of }\mathcal{H}^{-1}$$

Thus, if the smallest eigenvalue at the minimizer $x$ is strictly positive, say $\lambda_1\geq a>0$, all eigenvalues of $\mathcal{H}_x^{-1}$ are upperbounded by $1/a$, and thus its norm is bounded.

Additionally, if the Hessian operator is continuous (it usually is, but you have to check), then the IRH operator is bounded in a neighborhood of the minimizer, and the proof is complete. This does not tell how big/small the neighborhood is, but it's enough to claim existence.

Furthermore, you don't actually need to derive the eigenvalues of the Riemannian Hessian operator. Riemannian manifolds require you to define a Riemannian inner product, tangent spaces and other geometry definitions, so the largest (resp. smallest) eigenvalue of a symmetric operator $G$ you can consider the maximum (resp. minimum) of the inner product $\langle u, G(u)\rangle$ where $u$ has unit norm. A particular case would be $G(u)=\mathrm{Hess}_xf[u]$ the Riemannian Hessian at $x$, and $u$ is a tangent vector at $x$ with norm 1. This means that you don't even need to express the Riemannian Hessian operator as a matrix (which is also kind of hard with the conjugate of the argument in my case): one only needs to understand what $G(u)$ is, and compute that inner product, which in my complex Euclidean manifold is $$ \langle u, v\rangle = \mathrm{Re}(u^{\mathsf{H}}v) = u^{\mathsf{H}}v+v^{\mathsf{H}}u$$

For the Riemannian Hessian, at a minimizer $x$, this yields $$ \langle u, \mathrm{Hess}_xf[u]\rangle = u^{\mathsf{H}}(\mathbf{A}_xu+\mathbf{B}_x\overline{u})+(\mathbf{A}_xu+\mathbf{B}_x\overline{u})^{\mathsf{H}}u = \begin{bmatrix}u&\overline{u}\end{bmatrix}^{\mathsf{H}}\nabla^2f(x) \begin{bmatrix}u\\\overline{u}\end{bmatrix}$$

which is readily available, as in my case, $\mathbf{A}_w^{\mathsf{H}}=\overline{\mathbf{A}_w}$ and $\mathbf{B}_w^{\mathsf{H}}=\overline{\mathbf{B}_w}$ for any $w$.

Thanks for your attention!

Related Question