[Math] Why do differentiation rules work? What’s the intuition behind them? (Not asking for proofs)

calculusderivativesintuitionreal-analysis

Differentiation rules have been bugging me ever since I took Basic Calculus. I thought I'd develop some intuitive understanding of them eventually, but so far all my other math courses (including Multivariable Calculus) take the rules for granted.

I know how to prove some of the rules. The problem is that algebra manipulation alone isn't quite convincing to me. Is there any possibility of understanding why the algebra happens to work that way? For example, why do the slopes of the tangent line to the parabola x^2 happen to be determined by 2x? Looking at it graphically, there's no way I could've told that.

Any sources covering this issue (books; internet sites; etc) would be very greatly appreciated. Thanks in advance.

Best Answer

The key intuition, first of all, is that the product of two tiny differences is negligible. You can intuit this just by doing computations:

$$3.000001 \cdot 2.0001 = 6.0003020001$$

If we are doing any sort of rounding of hand computations, we'd likely round away that $0.0000000001$ part. If you were doing computations to eight significant digits, a value $v$ is really a value in a range roughly of $v\left(1 \pm 10^{-8}\right)$ and the error when you multiply $v_1$ by $v_2$ is almost entirely $10^{-8}|v_1v_2|$. The other part of the error is so tiny you'd probably ignore it.

Case: $f(x)=x^2$

Now, consider a square with corners $(0,0), (0,x), (x,0), (x,x)$. Grow $x$ a little bit, and you see the area grows by proportionally by the size of two of the edges, plus a tiny little square. That tiny square is negligible.

This is a little harder to visualize for $x^n$, but it actually works the same way when $n$ is a positive integer, by considering an $n$-dimensional hypercube.

This geometric reason is also why the circumference of a circle is equal to the derivative of its area – if you increase the radius a little, the area is increased by approximately that "little" times the circumference. So the derivative of $\pi r^2$ is the circumference of the circle, $2\pi r$.

It's also a way to understand the product rule. (Or, indeed, FOIL.)

Case: The chain rule

The chain rule is better seen by considering an odd-shaped tub. Let's say that when the volume of the water in a tube is $v$ then the tub is filled to depth $h(v)$. Then assume that we have a hose that, between time $0$ and time $t$, has sent a volume of $v(t)$ water.

At time $t$, what is the rate that the height of the water is increasing?

Well, we know that when the current volume is $v$, then the rate at which the height is increasing is $h'(v)$ times the rate the volume is increasing. And the rate the volume is increasing is $v'(t)$. So the rate the height is increasing is $h'(v(t)) \cdot v'(t)$.

Case: Inverse function

This is the one case where it is obvious from the graph. When you flip the coordinates of a Cartesian plane, a line of slope $m$ gets sent to a line of slope $1/m$. So if $f$ and $g$ are inverse functions, then the slope of $f$ at $(x,f(x))$ is the inverse of the slope of $g$ at $(f(x),x)=(f(x),g(f(x)))$. So $g'(f(x))=1/f'(x)$.

$x^2$ revisited

Another way of dealing with $f(x)=x^2$ is thinking again of area, but thinking of it in terms of units. If we have a square that is $x$ centimeters, and we change that by a small amount, $\Delta x$ centimeters, then the area is $x^2\mathrm{cm}^2$ and it goes to approximately $f(x+\Delta x)-f(x)=f'(x)\Delta x$.

On the other hand, if we measure the square in meters, it has side length $x/100$ meters and area $(x/100)^2$. The change in the side length is $(\Delta x)/100$ meters. So the expected area change is $f'(x/100)\cdot (\Delta x)/100$ square meters. But this difference should be the same, so $$f'(x)\Delta x = f'(x/100)\cdot\frac{\Delta x}{100}\cdot \left(100^2 \text{m}^2/\text{cm}^2\right) = 100 f'(x/100)$$

More generally, then, we see that $f'(ax)=af'(x)$ when $f(x)=x^2$ by changing units from centimeters to a unit that is $1/a$ centimeters.

So we see that $f'(x)$ is linear, although it doesn't explain why $f'(1)=2$.

If you do the same for $f(x)=x^n$, with units $\mu$ and another unit $\rho$ where $a\rho = \mu$, then you get that the a change in volume when changing by $\Delta x\,\mu$ is $f'(x)\Delta x\,\mu^n$. It is also $f'(ax)\cdot a(\Delta x)\,\rho^n$. Since $\mu/\rho = a$, this means $f'(ax) =a^{n-1}f'(x)$.

Again, we still don't know why $f'(1)=n$, but we know $f'(x)=f'(1)x^{n-1}$.