Not following the derivation of $\frac{dy}{dx}=-\frac{F_x}{F_y}$

I've seen similar questions to mine asked on the forum, but I haven't seen answers that address the part I'm confused about.

My calculus textbook (Thomas from Pearson) derives the following formula to "take some of the algebra out of implicit differentiation":

Suppose the function $F(x,y)$ is differentiable and the equation $F(x,y)=0$ defines $y$ implicitly as a differentiable function of $x$. Then at any point where $F_y\neq 0$, we have $$\frac{dy}{dx}=-\frac{F_x}{F_y}$$.

(The formula itself is pretty intuitive to me, except for the negative sign.) I feel like I am misinterpreting the derivation given, as it seems to be using $F(x,y)$ to denote two different functions and treating them as if they are the same. The derivation goes like this:

Suppose that (1) the function $F(x,y)$ is differentiable and that (2) the equation $F(x,y)=0$ defines $y$ implicitly as a differentiable function of $x$. Since $w=F(x,y)=0$, the derivative $\frac{dw}{dx}$ must be zero.

As I understand this, they are defining a new function $w:\{(x,y):F(x,y)=0\}\rightarrow\{0\}$, a level curve of the original $F(x,y)$, which is zero everywhere on its domain, and we're to suppose that its domain defines $y$ implicitly in terms of $x$. But then they continue:

… Computing the derivative [of the equation $w=F(x,y)=0$] from the chain rule, we find $$0=\frac{dw}{dx}=F_x\frac{dx}{dx}+F_y\frac{dy}{dx}=F_x+F_y\frac{dy}{dx}.$$ Therefore, we have $$\frac{dy}{dx}=-\frac{F_x}{F_y}.$$

This is where I get confused. In the example questions, it is clear that $F_x$ and $F_y$ denote the partial derivatives of the original function $F(x,y)$ of which $w$ is a level curve. But this use of the chain rule seems to assume that those are also the partials of w (which is a constant function, and should have zero derivatives, no?). I'm interpreting this as a special case of
$$\frac{dw}{dt}=\frac{\partial w}{\partial x}\frac{dx}{dt}+\frac{\partial w}{\partial y}\frac{dy}{dt}$$ where $t=x$, and where $\frac{\partial w}{\partial x}$ and $\frac{\partial w}{\partial y}$ are written as $F_x$ and $F_y$. But I'm not seeing how the former and the latter partials are equivalent. Why can we assume both that $\frac{dw}{dx}=0$ and that $F_x=\frac{\partial w}{\partial x}$, when $F_x$ is not zero in general? Or is that assumption not actually being made by using the chain rule this way? What am I missing or getting wrong here? I'd really appreciate if someone would set me on the right track so that I can get some intuition for why this theorem works.
Thanks!

Best Answer

Yes, there are several abuses of notation here. What is happening is you're first given a smooth function $F:\Bbb{R}^2\to\Bbb{R}$; for simplicity assume that at every point $p\in\Bbb{R}^2$, we have $\frac{\partial F}{\partial y}(p)\neq 0$. The implicit function theorem tells us that if you fix such a point $p=(a,b)$, then you can find some smooth function $\eta:I\subset\Bbb{R}\to\Bbb{R}$ such that $\eta(a)=b$ and for all $t\in I$, we have $F(t,\eta(t))=0$. So, the function $w:I\to\Bbb{R}$ defined as $w(t)=F(t,\eta(t))$ is smooth and is zero at every point; i.e is the constant zero function. So, we obviously have that $w'=0$. But now what does the chain rule tell us (note that $w$ is the composition of $F$ with the function $t\mapsto (t,\eta(t))$, so chain rule is indeed the way to go)? It tells us for each $t\in I$, \begin{align} 0&=w'(t)=\frac{\partial F}{\partial x}\bigg|_{(t,\eta(t))} \cdot 1+\frac{\partial F}{\partial y}\bigg|_{(t,\eta(t))}\cdot \eta'(t) \end{align} Rearranging this equation, we get \begin{align} \eta'(t)&=-\frac{\frac{\partial F}{\partial x}\bigg|_{(t,\eta(t))}}{\frac{\partial F}{\partial y}\bigg|_{(t,\eta(t))}}. \end{align} Hopefully with the different notation, it's clear what the different functions are, and how the chain rule is being applied, and where everything is evaluated.

If the $x,y$ are confusing (and I believe they are), you can write the chain rule computation as follows: for each $t\in I$, \begin{align} 0&=w'(t)=(\partial_1F)_{(t,\eta(t))}\cdot 1+(\partial_2F)_{(t,\eta(t))}\cdot \eta'(t), \end{align} and hence \begin{align} \eta'(t)&=-\frac{(\partial_1F)_{(t,\eta(t))}}{(\partial_2F)_{(t,\eta(t))}} \end{align}

It is an abuse of notation to use $y$ to refer to both the coordinate, and also the name of the implicitly defined function, and to use $F$ as both the original function, and the new composed function $w$, but unfortunately, it is standard practice.

Best Answer

Related Solutions

Function with partial derivatives that exist and are both continuous at the origin but the original function is not differentiable at the origin

Related Question