[Math] Optimization with implicit differentiation

calculusmultivariable-calculusoptimization

I want to find the extrema of a function $F(x,y,z)$ subject to a constraint $g(x,y,z)=c$. This means that implicitely $z=z(x,y)$. So I find $ \frac {\partial F}{\partial x}$ and $ \frac {\partial F}{\partial y}$, using the chain rule to take into account that $z=z(x,y)$. Then I find the points $(x,y,z)$ that solve $\frac {\partial F}{\partial x}=0$ and $\frac {\partial F}{\partial x}=0$.

For example, let $F(x,y,z)=20+2x+2y+z^2$ and $g(x,y,z)=x^2+y^2+z^2=11$. So
$$\frac {\partial F}{\partial x}=2+2z\frac {\partial z}{\partial x}$$ and $$\frac {\partial F}{\partial y}=2+2z\frac {\partial z}{\partial y}$$
To find $\frac {\partial z}{\partial x}$ and $\frac{\partial z}{\partial y}$ I use the constraint equation, taking into consideration that the implicit derivatives are $\frac{\partial z}{\partial x}=-\frac{g_x}{g_z}$ and $\frac{\partial z}{\partial y}=-\frac{g_y}{g_z}$. Given that $g_x=2x$, $g_y=2z$ and $g_z=2z$, then $\frac{\partial z}{\partial x}=-\frac{2x}{2z}=-\frac{x}{z}$ and $\frac{\partial z}{\partial y}=-\frac{2y}{2z}=-\frac{y}{z}$. So,
$$0=\frac {\partial F}{\partial x}=2+2z\frac {-x}{z}=2(1-x)$$ and $$0=\frac {\partial F}{\partial y}=2+2z\frac {-y}{z}=2(1-y)$$ We see that $x=1$ and $y=1$. Replazing this values in the constraint equation we found $z$. $$1^2+1^2+z^2=11 \Rightarrow z=\pm 3$$
We can check that $(x,y,z)=(1,1,3)$ and $(x,y,z)=(1,1,-3)$ are maxima.

The real question start here. How to apply this method of optimization (I think it's called "constrained optimization with implicit differentiation") if I have two (o more) constraint equations instead of one?

For example, I want to optimize the (same) function $F(x,y,z)=20+2x+2y+z^2$ constrained by $g_1(x,y,z)=x^2+y^2+z^2=11$ and $g_2(x,y,z)=x+y+z=3$.

$\frac {\partial F}{\partial x}$ and $\frac {\partial F}{\partial y}$ still are
$$\frac {\partial F}{\partial x}=2+2z\frac {\partial z}{\partial x} \tag{1} $$ and $$\frac {\partial F}{\partial y}=2+2z\frac {\partial z}{\partial y} \tag{2} $$ but now
$$\frac{\partial z}{\partial x}=-\frac{{g_1}_x}{{g_1}_z}=-\frac{2x}{2z}=-\frac{x}{z} \ \ and \ also \ \ \frac{\partial z}{\partial x}=-\frac{{g_2}_x}{{g_2}_z}=-\frac{1}{1}=-1 \tag{3}$$
and
$$\frac{\partial z}{\partial y}=-\frac{{g_1}_y}{{g_1}_z}=-\frac{2y}{2z}=-\frac{y}{z} \ \ and \ also \ \ \frac{\partial z}{\partial y}=-\frac{{g_2}_y}{{g_2}_z}=-\frac{1}{1}=-1 \tag{4}$$
(This might suggest that the extrema satisfies $x=y=z$, but I've checked with other method of optimiztion and this is wrong.) So, which of this $\frac{\partial z}{\partial x}$ and $\frac{\partial z}{\partial y}$ should I plug into equations $(1)$ and $(2)$ in order to find the extrema of $F(x,y,z)$ constrained?

(I know that I could solve this problem using direct substitution or Lagrange multipliers, but that's not what I ask in this question)


Edit: Reading the answers and reading about the implicit function theorem (in chapter 2 of Advanced Calculus, by Kaplan), finally I understood how to solve it.

For handier notation I rename the constraints $g_1(x,y,z)=c_1$ and $g_2(x,y,z)=c_2$ as $g(x,y,z)=c_1$ and $h(x,y,z)=c_2$, respectively.

First of all, we shall realize that the equations $g(x,y,z)=c_1$ and $h(x,y,z)=c_2$ imply that there are two dependent variables and one independent variable, for example: $y=y(x$) and $z=z(x)$, with $x$ the independent variable.

Taking differentials of $g(x,y,z)=c_1$ and $h(x,y,z)=c_2$, we have
$dg=g_x dx+g_y dy+g_z dz=0$ and $dh=h_x dx+h_y dy+h_z dz=0$. This can be arranged as $-g_x dx=g_y dy+g_z dz=0$ and $-h_x dx=h_y dy+h_z dz=0$ which in matrix is

$$
\binom{-g_x dx}{-h_x dx} = \begin{bmatrix} g_y & g_z \\ h_y & h_z \end{bmatrix} \binom{dy}{dz}
$$

Here we want to solve solve $dy$ as $dy=G(x,y,z)\ dx$ and $dz$ as $dz=H(x,y,z)\ dx$, where G and H are two yet unknown functions. We could solve it by row operations o Cramer's rule. Using Cramer's rule we get:

$$ dy = \frac{\begin{vmatrix} – g_x\ dx & g_z \\ – h_x\ dx & h_z \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } = – \frac{\begin{vmatrix} g_x & g_z \\ h_x & h_z \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } dx$$

$$ dz = \frac{\begin{vmatrix} g_y & – g_x\ dx \\ h_y & – h_x\ dx \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } = – \frac{\begin{vmatrix} g_y & g_x \\ h_y & h_x \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } dx $$

Rearranging and realizing that the determinants are Jacobians, we get

$$ \frac{dy}{dx} = – \frac{\begin{vmatrix} g_x & g_z \\ h_x & h_z \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } = – \frac{ \frac{\partial(g,h)}{\partial(x,z)} }{ \frac{\partial(g,h)}{\partial(y,z)} } $$

$$ \frac{dz}{dx} = – \frac{\begin{vmatrix} g_y & g_x \\ h_y & h_x \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } = – \frac{ \frac{\partial(g,h)}{\partial(y,x)} }{ \frac{\partial(g,h)}{\partial(y,z)} } $$

Now I will use this to solve the constrained optimization exercise put above, which is:

Optimize $F(x,y,z)=20+2x+2y+z^2$ constrained by $g(x,y,z)=x^2+y^2+z^2=11$ and $h(x,y,z)=x+y+z=3$.

This constraints imply that two variable are dependent and one is independent. For example, $y=y(x)$ and $z=z(x)$. So, using the chain rule
$$ \frac{\partial F}{\partial x} = \frac{\partial}{\partial x} \left ( 2x \right ) + \frac{\partial}{\partial y} \left ( 2y \right ) \frac{dy}{dx} + \frac{\partial}{\partial z} \left ( z^2 \right ) \frac{dz}{dx} =2+2\frac{dy}{dx}+2z\frac{dz}{dx}=0$$

To compute $\frac{dy}{dx}$ and $\frac{dz}{dx}$ I use the above result,

$$g_x=2x\ \ \ \ \ \ g_y=2y\ \ \ \ \ \ g_z=2z$$
$$h_x=1\ \ \ \ \ \ h_y=1\ \ \ \ \ \ h_z=1$$

$$ \frac{dy}{dx} = – \frac{\begin{vmatrix} g_x & g_z \\ h_x & h_z \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } = – \frac{\begin{vmatrix} 2x & 2z \\ 1 & 1 \end{vmatrix}}{\begin{vmatrix} 2y & 2z \\ 1 & 1 \end{vmatrix} } = -\frac{2x-2z}{2y-2z}= -\frac{x-z}{y-z} $$

$$ \frac{dz}{dx} = – \frac{\begin{vmatrix} g_y & g_x \\ h_y & h_x \end{vmatrix}}{\begin{vmatrix} g_y & g_z \\ h_y & h_z \end{vmatrix} } = \frac{\begin{vmatrix} 2y & 2x \\ 1 & 1 \end{vmatrix}}{\begin{vmatrix} 2y & 2z \\ 1 & 1 \end{vmatrix} } = – \frac{2y-2x}{2y-2z}=- \frac{y-x}{y-z} $$

So, we get

$$ \frac{\partial F}{\partial x} = 2-2 \frac{x-z}{y-z}-2z \frac{y-x}{y-z}=2\frac{(y-z)-(x-z)-z(y-x)}{y-z}=2\frac{(y-x)-z(y-x)}{y-z}=2\frac{(y-x)(1-z)}{y-z}=0 \ \ \ \ (y\neq z) $$

Solving the system of equations

$$ 2\frac{(y-x)(1-z)}{y-z}=0 \tag{E1} $$
$$ x^2+y^2+z^2=11 \tag{E2} $$
$$ x+y+z=3 \tag{E3} $$

we obtain that the extrema are in the points

$$ \left (x,y,z\right )= \left \{ \left (1+\frac{2}{\sqrt{3}},1+\frac{2}{\sqrt{3}},1-\frac{4}{\sqrt{3}} \right ),\left (1-\frac{2}{\sqrt{3}},1-\frac{2}{\sqrt{3}},1+\frac{4}{\sqrt{3}} \right ),\left (3,-1,1\right ),\left (-1,3,1\right ) \right \} $$

which is correct acording to Wolfram Alpha (the first and second points are maximum and the third and fourth are minimum).

By the way, the system of equations that arise from Lagrange multipliers is

$$2=2\lambda x +\mu $$
$$2=2\lambda y+\mu $$
$$2z=2\lambda z+\mu $$
$$ x^2+y^2+z^2=11 $$
$$ x+y+z=3 $$

In this particular example I think this method (implicit derivative) is easier than Lagrange multipliers. From equation $E1$ it's readily seen that either $x=y$ or either $z=1$, so the system is almost solved.

Best Answer

Let $g:\mathbb{R}^3 \to \mathbb{R}^2$ be given by $g((x,y,z)) = (g_1(x,y,z)-11, g_2(x,y,z)-3)$. The feasible set is described by $g^{-1}(\{0\})$, and you are trying to find $z$ as a function of $x,y$ locally.

Let me abuse notation and write $g(p,z)$ with $p=(x,y)$ to simplify my life.

Then you need to show that ${\partial g(p,z)) \over \partial p}$ is invertible at a solution $(\hat{p}, \hat{z})$, and the implicit function theorem gives some $\pi$ defined locally around $\hat{z}$ such that $g(\pi(z),z) =0$ for $z$ near $\hat{z}$, and we have ${\partial \pi(\hat{z})) \over \partial z} = - {\partial g(\hat{p},\hat{z})) \over \partial p}^{-1} {\partial g(\hat{p},\hat{z})) \over \partial z}$.

If we let $\phi(z) = f(\pi(z),z)$ (abusing notation again), then at a solution we will have ${\partial \phi(\hat{z})) \over \partial z} = 0$, using the chain rule this reduces to ${\partial f(\hat{z},\hat{p})) \over \partial z} = {\partial f(\hat{z},\hat{p})) \over \partial p} {\partial g(\hat{p},\hat{z})) \over \partial p}^{-1} {\partial g(\hat{p},\hat{z})) \over \partial z}$.

(As an aside, when used as a method for numerical optimization, this is known as 'reduced gradients'.)

Related Question