Let $g,h \in \mathbb{R}^d$ and $A \in \mathbb{R}^{d \times d}$ symmetric. I am having the following two variable function

$$f(x,y) = \frac{1}{2} g^T\left(A + \frac{xy}{2}I\right)^{-1}g+\frac{x}{6}y^3 \tag{1}$$ with $x\geq 0$, $ y>0$, $g = -\left(A + \frac{xy}{2}I\right)h $ and $h \neq 0$. Let also

$$h(x,y) = – \left(A + \frac{xy}{2}I\right)^{-1} g,\tag{2}$$ where $g$ the actual constant vector.

To minimize $(1)$, I took the first derivative w.r.t. $x$ and $y$, and set them equal to zero, i.e.,

$$\begin{aligned}\partial_x f(x,y)= -\frac{y}{4} g^T\left(A + \frac{xy}{2}I\right)^{-2}g+\frac{y^3}{6} = -\frac{y}{4} ||h(x,y)||^2 + \frac{y^3}{6} = & 0 \Leftrightarrow \\ -\frac{y}{2}\left( \frac{||h(x,y)||^2}{2} – \frac{y^2}{3}\right) = &0\end{aligned}\tag{3}$$

and

$$\begin{aligned} \partial_y f(x,y)= -\frac{x}{4} g^T\left(A + \frac{xy}{2}I\right)^{-2}g+\frac{x}{2}y^2 = -\frac{x}{4} ||h(x,y)||^2 + \frac{xy^2}{2} = 0 & \Leftrightarrow \\ -\frac{x}{2}\left(\frac{||h(x,y)||^2 }{2} – y^2\right) = &0,\end{aligned}\tag{4}$$

where $$g^T\left(A + \frac{xy}{2}I\right)^{-2} g = \left[g^T\left(A + \frac{xy}{2}I\right)^{-1}\right] \left[\left(A + \frac{xy}{2}I\right)^{-1} g\right] = {||h(x,y)||^2} \tag{5}$$ is used.

What bothers me is the optimization procedure. Suppose, for a fixed $x$ we have solved the non-linear equation $||h(x,y)|| = \sqrt{2}/\sqrt{3}y$ to get $y$. Then, given $y$, from $(3)$, $x$ can be either zero or can be computed by solving $||h(x,y)|| = \sqrt{2} y$.

Even taking the derivatives and setting them to zero is an intuitive approach I found the optimization procedure strange enough. Could you please someone give some suggestions on a more appropriate way to solve (1)?

PS: In this case is it more preferable to use an black box root finder?

**EDIT1:** After @Joe comment I have tried to write the above equations using $u=-\frac{xy}{2}$ and $v=y$. Using this we have

$$f(u,v) = \frac{1}{2} g^T\left(A – u\:I\right)^{-1}g-\frac{uv^2 }{3}, \quad v \neq 0, \quad u \geq 0 \tag{6}$$

$$h(u,v) = – \left(A – u\:I\right)^{-1} g, \quad g \neq 0 \tag{7}$$

$$\partial_u f(u,v) = \frac{1}{2} ||h(u,v)||^2 – \frac{v^2}{3}, \quad h(u,v) \neq 0 \tag{8}$$

$$\partial_v f(u,v) = -\frac{2}{3}uv \tag{9}$$

**EDIT2:** Using the diagonalization @Joe suggested, i.e.,$A = O\Lambda O^T$ where $\Lambda$ a diagonal and $O$ orthogonal, we can write

$$f(u, v) = \frac{1}{2} ||O(\Lambda -u \:I)^{-1/2}O^Tg||^2 – \frac{1}{3}uv^2 \tag{10} $$.

## Best Answer

Change variables to $u = -xy/2, v = y$. This co-ordinate transformation is invertible, so it's fine to minimise in $u,v$ and transform back to $x,y$ after. Then you see that $$f(u,v) = g^T(A-uI)^{-1}g/2 - u v^2/3.$$ Note that $\det(A-uI)=0$ whenever $u$ is an eigenvalue of $A$, and hence non invertible for some set of discrete values of $u$.

This means that for general $A$ you should not expect to be able to minimise $f$. For example, let $d=1$. Then $$f(u,v) =\frac{1}{2} \frac{g^2}{A-u} - u v^2/3,$$ which has a simple pole when $u=A$. So this function is unbounded from below and has no minimum.

For $d>1$, it will depend on the geometric multiplicities of your eigenvalues. If your matrix has all eigenvalues of even multiplicity then I believe you will get poles of the form $1/(\lambda-u)^{2n}$, in which case a minimum may exist. If you have any eigenvalues with odd multiplicities then your $f$ would be unbounded from below. So you need to look at the spectrum of eigenvalues of your $A$; most matrices $A$ would produce an $f$ with no minimum.

Edit; I notice that you have now got $x\ge 0$ and $y>0$, and $A$ symmetric. Symmetric $A$ is diagonalisable, and hence has no second order poles in the expansion of $(A-uI)^{-1}$. However the restrictions on $x$ and $y$ may mean that the poles all fall outside of this region, which would be necessary for a minimum of your function to exist. To make further progress with this problem, it would be necessary to have further information about the spectrum of eigenvalues of $A$. Maybe it would be sufficient to know that they were all positive or negative, although I'd need to think about this.

Diagonalize $A$ with an orthogonal matrix $O$ such that $A=ODO^T$, where $D$ has the eigenvalues $\lambda_i$ of $A$ along its diagonal. Define $w=O^T g$. Then your $f$ can be written as $$f(u,v) = w^T(D-uI)^{-1}w/2 - u v^2/3.$$ Now we can expand the sum because $D$ is diagonal to arrive at $$f(u,v) = \frac{1}{2}\sum_{i=1}^d \frac{w_i^2}{\lambda_i -u} - u v^2/3.$$ It is necessary to know that all of the poles in this expression lie outside of the region which you want to minimise in for a minimum to exist.

Edit2: Suppose that $A$ has all positive eigenvalues. Now look for points where the gradient of $f$ is 0; $$\partial_u f =\frac{1}{2}\sum_{i=1}^d \frac{w_i^2}{(\lambda_i -u)^2} - v^2/3$$ and $$ \partial_v f = -2u v/3=0.$$

This is equivalent to a set of simultaneous polynomial equations. The second equation is solved either by $u=0$ or $v=0$. So firstly we see that there are two solutions of the form $$u=0, v =\pm\sqrt{\frac{3}{2}\sum_{i=1}^d \frac{w_i^2}{(\lambda_i)^2}}.$$Based on the fact that you want $y=v>0$, you can throw away the negative square root here.

It is also possible to solve this equation by setting $v=0$, which is different from the 1d case I described in the comments. In that case you would have a polynomial equation to solve, after multiplying through by all of the denominators in the following equation $$\sum_{i=1}^d \frac{w_i^2}{(\lambda_i -u)^2}=0.$$ if you are having trouble understanding this, try setting $d=2$ and then $d=3$, and look at the equation in both cases. Given that you asked for $v>0$ in your question, you may not be interested in this, however this could mean that the minimum of the function lies along $v=0$ and hence no minium exists for $v>0$.

In all cases, solving for $\partial_u f = \partial_v f = 0$ is not a necessary or sufficient condition for $f$ to be minimised at these points; you will need to look at these points and check if they are minima, maxima or saddle points, and also look along the boundaries $u=0$ and $v=0$ in case the function is minimised here.