Confusion about chain rule with Leibniz’s notation

calculuschain rule

Suppose we have two functions $f,g:\Bbb R\rightarrow \Bbb R$. The chain rule states the following about the derivative of the composition of these functions, namely that
$$
(f \circ g)'(x) = f′(g(x))\cdot g′(x).
$$

However, the equivalent expression using Leibniz notation seems to be saying something different. I know that $f'(g(x))$ means the derivative of $f$ evaluated at $g(x)$, but when considering the Leibniz equivalent of the chain rule, it appears that it should really mean the derivative of $f$ with respect to $g(x)$. If we let $z=f(y)$ and y=$g(x)$, then
$$
{\frac {dz}{dx}}={\frac {dz}{dy}}\cdot {\frac {dy}{dx}}.
$$

Where here the $\frac{dz}{dy}$ corresponds to $f'(g(x))$. Since $y=g(x)$, I am tempted to believe that the expression $f'(u)$ means the derivative of $f$ with respect to $u$; it would make sense in this case as we are treating $g(x)$ as the independant variable. This leaves me with the question: does $f'(g(x))$ mean the derivative of $f$ evaluated at $g(x)$, $\frac{df}{dx} \Bigr\rvert_{x = g(x)}$, or the derivative of $f$ with respect to $g(x)$, $\frac{df}{dg(x)}?$

Best Answer

In my opinion the usual way of writing the chain rule in Leibniz notation is confusing and, I would say, bad. It's a frequent source of confusion on this website.

The function that is called $z$ on the left is not the same as the function that is called $z$ on the right. In other words, two different functions are being called by the same name. It would be better to give the function on the left its own name, such as $\hat z(x) = z(y(x))$. Then, using Leibniz notation, the chain rule could be written as $\frac{d\hat z}{dx} = \frac{dz}{dy} \frac{dy}{dx}$. This is still a little confusing: $\frac{dz}{dy}$ is to be interpreted as $z'(y(x))$.

In my opinion the notation $$\hat z'(x) = z'(y(x)) y'(x)$$ is far more clear.

To specifically address the final part of your question: $f'(g(x))$ is the derivative of $f$ evaluated at $g(x)$. I would not use the phrase "derivative of $f$ with respect to $g(x)$".


Edit: Here is the thought process behind the Leibniz notation, and an explanation for why it has become so popular despite the fact that I think it's confusing.

Think about the quantity $z(y(x))$, and imagine what happens if $x$ is perturbed by a small amount $\Delta x$. Then the output of $y$ is perturbed by a small amount $\Delta y$, and the output of $z$ is correspondingly perturbed by a small amount $\Delta z$. And we have $$ \frac{\Delta z}{\Delta x} = \frac{\Delta z}{\Delta y} \frac{\Delta y}{\Delta x} $$ The term on the left is approximately $\hat z'(x)$, but you can see the temptation to call it $\frac{dz}{dx}$. The term $\frac{\Delta z}{\Delta y}$ is approximately $z'(y(x))$, but you can see the temptation to call it $\frac{dz}{dy}$. And the term $\frac{\Delta y}{\Delta x}$ is approximately $y'(x)$, and of course you see the temptation to call it $\frac{dy}{dx}$.

Related Question