[Math] What does $\frac{dy}{du}$ mean

calculus

I understand that $dy/dx$ is the rate of change, and that it means "the rate of change of $y$ with respect to $x$", but when I see people use $dy/du$ I get confused ($u$ of course being any variable). What is meant by using $du$ or anything else but $dx$ in $dy/du$? When I look at a graph I can only see a $y$ and $x$ axis.

Here is a video example of $dy/du$:

I'm new to calculus and I'm trying to understand the chain rule.

Best Answer

This is a very problematic notation, I'll first talk informally and then give you the right way to do things. The point is that it has to do with change of variables. For example, if you have $y=(ax+b)^2$ then if we set $u = ax+b$ we would have $y=u^2$. Thus, if we change in $x$, there is a change ocurring in $u$ and this will give you a change in $y$ that you can calculate with the chain rule.

In the usual notation for derivatives, the chain rule would be stated as:

$$\frac{dy}{dx}=\frac{dy}{du} \frac{du}{dx}.$$

Now, this notation is confusing, this notation is bad, and although everyone should know it to understand when read books and articles using it, people should really move to the modern notation. I'll explain why: it all has to do with the notion of composition of functions.

If we have two functions $f : \mathbb{R} \to \mathbb{R}$ and $g : \mathbb{R} \to \mathbb{R}$ one can form the composition $g \circ f : \mathbb{R} \to \mathbb{R}$ that is defined by $(g\circ f)(x)=g(f(x))$ so, the composition is the result of applying $g$ to the result of applying $f$ to $x$.

In this notation, the chain rule is written as $(g\circ f)'(x)=g'(f(x))f'(x)$ and this notation is much better because it doesn't carry any ambiguities. It says "to take derivative of a composition, take the derivative of $g$ and of $f$ normally, then apply the derivative of $g$ at $f(x)$ and multiply by the derivative of $f$ at $x$".

The example I gave you would have $g(x)=x^2$, $f(x)=ax+b$ and so

$$(g\circ f)(x)=g(f(x))=(ax+b)^2$$

To differentiate, we have $g'(x) = 2x$ and $f'(x)=a$ so $g'(f(x))=2(ax+b)$ and then:

$$(g\circ f)'(x)=2a(ax+b)$$

The usual notation carries ambiguities. First, notice that the function being defined doesn't depend on the letter used: the letter is just a symbol! So, writing $f(x)=x^2$ or $f(u)=u^2$ is the exact same thing, $x$ and $u$ are just placeholders for real numbers.

In the usual notation, the left hand side talks about the derivative of $y=(ax+b)^2$ and the right hand side about the derivative of $y=u^2$. So, $y$ is representing two different functions and this confuses a lot of people.

All of this is really confusing, and the rigorous framework exists because we don't want ambiguities: learning something rigorous can seem a little harder, but you will be able to understand without ambiguities like that. In that case, my suggestion is this : get the book Calculus by Michael Spivak, it will teach you how to think about Calculus in a logical way, running away from this kind of confusion.

I hope this helps you somehow. Good luck!

EDIT: The notation $f: \mathbb{R} \to \mathbb{R}$ just means that $f$ is a function with domain $\mathbb{R}$ and codomain $\mathbb{R}$, in other words, $f$ maps real numbers into real numbers. In general, given sets $A$ and $B$ one function that takes elements of $A$ into elements of $B$ is written $f : A \to B$.