I am trying to understand a few things about sequential and simultaneous optimization in [1]. In this post, it is shown that
$$\max_{x} \max_{y} f(x,y) = \max_{x,\ y} f(x,y).\tag{1}$$
Thanks to @Shiv Tavker comment, I understand that in order to get the optimal value $z^*=(x^*, y^*)$ in the RHS of $(1)$, we have to solve the system $\nabla_z f(z) = 0$ w.r.t. $z$, where $z = (x,y)$.
In addition, from the LHS of $(1)$ we have,
$${x^*}' = \arg \max_x f(x, y)\tag{2}$$ and $${y^*}' = \arg \max_y f({x^*}', y)\tag{3}.$$ Let ${z^*}' = ({x^*}', {y^*}')$.
As far as I understand, I think that given $(1)$, we can state that ${z^*}' \equiv {z^*}$. However, I am thinking if there are any cases that ${z^*}' \equiv {z^*}$ does not hold? Could you please someone give some comments or an answer of things are not so simple? Any help is highly appreciated.
EDIT1: Let $$\mathcal{f}(x, y) = -\frac{1}{2}\:\mathbf{z}^T \left(\mathbf{A} + \frac{xy}{2} \:\mathbf{I}\right)^{-1} \mathbf{z} – \frac{y}{6} \lambda – \frac{y}{12}x^3,$$ where $x\geq 0$, $y,\lambda > 0$, $\mathbf{A}$ a real symmetric positive semmi-definite, and $\mathbf{z}$ are fixed.
EDIT2: Let $\mathbf{t}(x,y) = – (\mathbf{A} + 0.5 x y\: \mathbf{I})^{-1} \mathbf{z}$. If we first solve $\partial_x f(x,y) = 0$ we get $$x = \| \mathbf{t}(x,y)\|\tag{4},$$ for $y >0$. Then, if we solve $\partial_y f(x,y) = 0$ and use $(4)$ we get $$\sqrt[3]{\lambda} = \| \mathbf{t}(x,y)\|.\tag{5}$$
Next, suppose that we first solve $x = \| \mathbf{t}(x,y)\|$ w.r.t. $x$ to get an optimal ${x^*}'$ and then solve $\sqrt[3]{\lambda} = \| \mathbf{t}({x^*}', y)\|$ w.r.t. $y$ to get ${y^*}'$. Can we say that ${z^*}' \equiv {z^*}$?
Best Answer
If $A$ is positive definite then $A=X^TDX$, where $D$ is a diagonal matrix with the eigenvalues $\lambda_i$ of $A$. Hence $$z^T\left(A+\frac{xy}{2}I\right)^{-1}z=(Xz)^T\begin{pmatrix} \lambda_1+\frac{xy}{2} & 0&\cdots&0 \\ 0 &\lambda_2+\frac{xy}{2}&\cdots&0 \\\vdots&\vdots&\ddots&0\\0&0&\cdots\lambda_n+\frac{xy}{2} \end{pmatrix}^{-1}(Xz)=\sum_{i=1}^nw_i^2\left(\lambda_i+\frac{xy}{2}\right)^{-1},$$ with $w=Xz$.
Otherwise, if you denote $$M=A+\frac{xy}{2}I,$$ you can see that $MM^{-1}=I$ and follows that $$\frac{\partial\,M}{\partial\,x}M^{-1}+M\frac{\partial\,M^{-1}}{\partial\,x}=0.$$ You find that $$\frac{\partial\,M^{-1}}{\partial\,x}=-\frac{y}{2}\left(M^{-1}\right)^2.$$ Hence, if $$\frac{\partial\,f(x,y)}{\partial\,x}=\frac{y}{4}z^T\left(M^{-1}\right)^2z-\frac{y}{4}x^2=0.$$ The condition on $y>0$ gives you $$x^2=z^T\left(M^{-1}\right)^2z.$$
If you also have $$\frac{\partial\,f(x,y)}{\partial\,y}=\frac{x}{4}z^T\left(M^{-1}\right)^2z-\frac{\lambda}{6}-\frac{1}{12}x^3=0.$$
It follows that $$\frac{x}{4}x^2-\frac{\lambda}{6}-\frac{1}{12}x^3=0.$$ And you have $x^*=\sqrt[3]{\lambda}$.
This helps you to handle with the equation $\nabla f(x,y)=(0,0)$.
Remark:
Equation (2) probably gives you implicit curves, say $X(x,y)=0$, hence your equation (3) looks to becomes a constrained optimization problem, which can be harder to solve (this thread can be useful). If this last problem has only one solution then this thread imply that it is the one you are looking for.
In a general case equations $\partial_x f(x,y) = 0$ and $\partial_y f(x,y) = 0$ probably gives implicit curves to you, say $u(x,y)=0$ and $v(x,y)=0$. This gives you a system of equations, with two equations and two variables, that you can try to solve numerically, like with Newton’s method or other that you can find on SearchOnMath.
The set of solutions of $X(x,y)=0$ is a subset of the set of solution of $u(x,y)=0$.