# Minimize a non-linear function with two variables

multivariable-calculusnonlinear systemoptimization

Let $$g,h \in \mathbb{R}^d$$ and $$A \in \mathbb{R}^{d \times d}$$ symmetric. I am having the following two variable function

$$f(x,y) = \frac{1}{2} g^T\left(A + \frac{xy}{2}I\right)^{-1}g+\frac{x}{6}y^3 \tag{1}$$ with $$x\geq 0$$, $$y>0$$, $$g = -\left(A + \frac{xy}{2}I\right)h$$ and $$h \neq 0$$. Let also

$$h(x,y) = – \left(A + \frac{xy}{2}I\right)^{-1} g,\tag{2}$$ where $$g$$ the actual constant vector.

To minimize $$(1)$$, I took the first derivative w.r.t. $$x$$ and $$y$$, and set them equal to zero, i.e.,

\begin{aligned}\partial_x f(x,y)= -\frac{y}{4} g^T\left(A + \frac{xy}{2}I\right)^{-2}g+\frac{y^3}{6} = -\frac{y}{4} ||h(x,y)||^2 + \frac{y^3}{6} = & 0 \Leftrightarrow \\ -\frac{y}{2}\left( \frac{||h(x,y)||^2}{2} – \frac{y^2}{3}\right) = &0\end{aligned}\tag{3}
and
\begin{aligned} \partial_y f(x,y)= -\frac{x}{4} g^T\left(A + \frac{xy}{2}I\right)^{-2}g+\frac{x}{2}y^2 = -\frac{x}{4} ||h(x,y)||^2 + \frac{xy^2}{2} = 0 & \Leftrightarrow \\ -\frac{x}{2}\left(\frac{||h(x,y)||^2 }{2} – y^2\right) = &0,\end{aligned}\tag{4}

where $$g^T\left(A + \frac{xy}{2}I\right)^{-2} g = \left[g^T\left(A + \frac{xy}{2}I\right)^{-1}\right] \left[\left(A + \frac{xy}{2}I\right)^{-1} g\right] = {||h(x,y)||^2} \tag{5}$$ is used.

What bothers me is the optimization procedure. Suppose, for a fixed $$x$$ we have solved the non-linear equation $$||h(x,y)|| = \sqrt{2}/\sqrt{3}y$$ to get $$y$$. Then, given $$y$$, from $$(3)$$, $$x$$ can be either zero or can be computed by solving $$||h(x,y)|| = \sqrt{2} y$$.

Even taking the derivatives and setting them to zero is an intuitive approach I found the optimization procedure strange enough. Could you please someone give some suggestions on a more appropriate way to solve (1)?

PS: In this case is it more preferable to use an black box root finder?

EDIT1: After @Joe comment I have tried to write the above equations using $$u=-\frac{xy}{2}$$ and $$v=y$$. Using this we have

$$f(u,v) = \frac{1}{2} g^T\left(A – u\:I\right)^{-1}g-\frac{uv^2 }{3}, \quad v \neq 0, \quad u \geq 0 \tag{6}$$

$$h(u,v) = – \left(A – u\:I\right)^{-1} g, \quad g \neq 0 \tag{7}$$

$$\partial_u f(u,v) = \frac{1}{2} ||h(u,v)||^2 – \frac{v^2}{3}, \quad h(u,v) \neq 0 \tag{8}$$

$$\partial_v f(u,v) = -\frac{2}{3}uv \tag{9}$$

EDIT2: Using the diagonalization @Joe suggested, i.e.,$$A = O\Lambda O^T$$ where $$\Lambda$$ a diagonal and $$O$$ orthogonal, we can write

$$f(u, v) = \frac{1}{2} ||O(\Lambda -u \:I)^{-1/2}O^Tg||^2 – \frac{1}{3}uv^2 \tag{10}$$.

Change variables to $$u = -xy/2, v = y$$. This co-ordinate transformation is invertible, so it's fine to minimise in $$u,v$$ and transform back to $$x,y$$ after. Then you see that $$f(u,v) = g^T(A-uI)^{-1}g/2 - u v^2/3.$$ Note that $$\det(A-uI)=0$$ whenever $$u$$ is an eigenvalue of $$A$$, and hence non invertible for some set of discrete values of $$u$$.

This means that for general $$A$$ you should not expect to be able to minimise $$f$$. For example, let $$d=1$$. Then $$f(u,v) =\frac{1}{2} \frac{g^2}{A-u} - u v^2/3,$$ which has a simple pole when $$u=A$$. So this function is unbounded from below and has no minimum.

For $$d>1$$, it will depend on the geometric multiplicities of your eigenvalues. If your matrix has all eigenvalues of even multiplicity then I believe you will get poles of the form $$1/(\lambda-u)^{2n}$$, in which case a minimum may exist. If you have any eigenvalues with odd multiplicities then your $$f$$ would be unbounded from below. So you need to look at the spectrum of eigenvalues of your $$A$$; most matrices $$A$$ would produce an $$f$$ with no minimum.

Edit; I notice that you have now got $$x\ge 0$$ and $$y>0$$, and $$A$$ symmetric. Symmetric $$A$$ is diagonalisable, and hence has no second order poles in the expansion of $$(A-uI)^{-1}$$. However the restrictions on $$x$$ and $$y$$ may mean that the poles all fall outside of this region, which would be necessary for a minimum of your function to exist. To make further progress with this problem, it would be necessary to have further information about the spectrum of eigenvalues of $$A$$. Maybe it would be sufficient to know that they were all positive or negative, although I'd need to think about this.

Diagonalize $$A$$ with an orthogonal matrix $$O$$ such that $$A=ODO^T$$, where $$D$$ has the eigenvalues $$\lambda_i$$ of $$A$$ along its diagonal. Define $$w=O^T g$$. Then your $$f$$ can be written as $$f(u,v) = w^T(D-uI)^{-1}w/2 - u v^2/3.$$ Now we can expand the sum because $$D$$ is diagonal to arrive at $$f(u,v) = \frac{1}{2}\sum_{i=1}^d \frac{w_i^2}{\lambda_i -u} - u v^2/3.$$ It is necessary to know that all of the poles in this expression lie outside of the region which you want to minimise in for a minimum to exist.

Edit2: Suppose that $$A$$ has all positive eigenvalues. Now look for points where the gradient of $$f$$ is 0; $$\partial_u f =\frac{1}{2}\sum_{i=1}^d \frac{w_i^2}{(\lambda_i -u)^2} - v^2/3$$ and $$\partial_v f = -2u v/3=0.$$

This is equivalent to a set of simultaneous polynomial equations. The second equation is solved either by $$u=0$$ or $$v=0$$. So firstly we see that there are two solutions of the form $$u=0, v =\pm\sqrt{\frac{3}{2}\sum_{i=1}^d \frac{w_i^2}{(\lambda_i)^2}}.$$Based on the fact that you want $$y=v>0$$, you can throw away the negative square root here.

It is also possible to solve this equation by setting $$v=0$$, which is different from the 1d case I described in the comments. In that case you would have a polynomial equation to solve, after multiplying through by all of the denominators in the following equation $$\sum_{i=1}^d \frac{w_i^2}{(\lambda_i -u)^2}=0.$$ if you are having trouble understanding this, try setting $$d=2$$ and then $$d=3$$, and look at the equation in both cases. Given that you asked for $$v>0$$ in your question, you may not be interested in this, however this could mean that the minimum of the function lies along $$v=0$$ and hence no minium exists for $$v>0$$.

In all cases, solving for $$\partial_u f = \partial_v f = 0$$ is not a necessary or sufficient condition for $$f$$ to be minimised at these points; you will need to look at these points and check if they are minima, maxima or saddle points, and also look along the boundaries $$u=0$$ and $$v=0$$ in case the function is minimised here.