Not following the derivation of $\frac{dy}{dx}=-\frac{F_x}{F_y}$

chain ruleimplicit-differentiationmultivariable-calculuspartial derivative

I've seen similar questions to mine asked on the forum, but I haven't seen answers that address the part I'm confused about.

My calculus textbook (Thomas from Pearson) derives the following formula to "take some of the algebra out of implicit differentiation":

Suppose the function $F(x,y)$ is differentiable and the equation $F(x,y)=0$ defines $y$ implicitly as a differentiable function of $x$. Then at any point where $F_y\neq 0$, we have $$\frac{dy}{dx}=-\frac{F_x}{F_y}$$.

(The formula itself is pretty intuitive to me, except for the negative sign.) I feel like I am misinterpreting the derivation given, as it seems to be using $F(x,y)$ to denote two different functions and treating them as if they are the same. The derivation goes like this:

Suppose that (1) the function $F(x,y)$ is differentiable and that (2) the equation $F(x,y)=0$ defines $y$ implicitly as a differentiable function of $x$. Since $w=F(x,y)=0$, the derivative $\frac{dw}{dx}$ must be zero.

As I understand this, they are defining a new function $w:\{(x,y):F(x,y)=0\}\rightarrow\{0\}$, a level curve of the original $F(x,y)$, which is zero everywhere on its domain, and we're to suppose that its domain defines $y$ implicitly in terms of $x$. But then they continue:

… Computing the derivative [of the equation $w=F(x,y)=0$] from the chain rule, we find $$0=\frac{dw}{dx}=F_x\frac{dx}{dx}+F_y\frac{dy}{dx}=F_x+F_y\frac{dy}{dx}.$$ Therefore, we have $$\frac{dy}{dx}=-\frac{F_x}{F_y}.$$

This is where I get confused. In the example questions, it is clear that $F_x$ and $F_y$ denote the partial derivatives of the original function $F(x,y)$ of which $w$ is a level curve. But this use of the chain rule seems to assume that those are also the partials of w (which is a constant function, and should have zero derivatives, no?). I'm interpreting this as a special case of
$$\frac{dw}{dt}=\frac{\partial w}{\partial x}\frac{dx}{dt}+\frac{\partial w}{\partial y}\frac{dy}{dt}$$ where $t=x$, and where $\frac{\partial w}{\partial x}$ and $\frac{\partial w}{\partial y}$ are written as $F_x$ and $F_y$. But I'm not seeing how the former and the latter partials are equivalent. Why can we assume both that $\frac{dw}{dx}=0$ and that $F_x=\frac{\partial w}{\partial x}$, when $F_x$ is not zero in general? Or is that assumption not actually being made by using the chain rule this way? What am I missing or getting wrong here? I'd really appreciate if someone would set me on the right track so that I can get some intuition for why this theorem works.
Thanks!

Best Answer

Yes, there are several abuses of notation here. What is happening is you're first given a smooth function $F:\Bbb{R}^2\to\Bbb{R}$; for simplicity assume that at every point $p\in\Bbb{R}^2$, we have $\frac{\partial F}{\partial y}(p)\neq 0$. The implicit function theorem tells us that if you fix such a point $p=(a,b)$, then you can find some smooth function $\eta:I\subset\Bbb{R}\to\Bbb{R}$ such that $\eta(a)=b$ and for all $t\in I$, we have $F(t,\eta(t))=0$. So, the function $w:I\to\Bbb{R}$ defined as $w(t)=F(t,\eta(t))$ is smooth and is zero at every point; i.e is the constant zero function. So, we obviously have that $w'=0$. But now what does the chain rule tell us (note that $w$ is the composition of $F$ with the function $t\mapsto (t,\eta(t))$, so chain rule is indeed the way to go)? It tells us for each $t\in I$, \begin{align} 0&=w'(t)=\frac{\partial F}{\partial x}\bigg|_{(t,\eta(t))} \cdot 1+\frac{\partial F}{\partial y}\bigg|_{(t,\eta(t))}\cdot \eta'(t) \end{align} Rearranging this equation, we get \begin{align} \eta'(t)&=-\frac{\frac{\partial F}{\partial x}\bigg|_{(t,\eta(t))}}{\frac{\partial F}{\partial y}\bigg|_{(t,\eta(t))}}. \end{align} Hopefully with the different notation, it's clear what the different functions are, and how the chain rule is being applied, and where everything is evaluated.


If the $x,y$ are confusing (and I believe they are), you can write the chain rule computation as follows: for each $t\in I$, \begin{align} 0&=w'(t)=(\partial_1F)_{(t,\eta(t))}\cdot 1+(\partial_2F)_{(t,\eta(t))}\cdot \eta'(t), \end{align} and hence \begin{align} \eta'(t)&=-\frac{(\partial_1F)_{(t,\eta(t))}}{(\partial_2F)_{(t,\eta(t))}} \end{align}

It is an abuse of notation to use $y$ to refer to both the coordinate, and also the name of the implicitly defined function, and to use $F$ as both the original function, and the new composed function $w$, but unfortunately, it is standard practice.

Related Question