Calculus – Intuitive Reasoning Behind the Chain Rule in Multiple Variables

calculuslinear algebra

I've sort of gotten a grasp on the Chain rule with one variable. If you hike up a mountain at 2 feet an hour, and the temperature decreases at 2 degrees per feet, the temperature would be decreasing for you at $2\times 2 = 4$ degrees per hour.

But I'm having a bit more trouble understanding the Chain Rule as applied to multiple variables. Even the case of 2 dimensions

$$z = f(x,y),$$

where $x = g(t)$ and $y = h(t)$, so

$$\frac{dz}{dt} = \frac{\partial z}{dx} \frac{dx}{dt} + \frac{\partial z}{dy} \frac{dy}{dt}.$$

Now, this is easy enough to "calculate" (and figure out what goes where). My teacher taught me a neat tree-based graphical method for figuring out partial derivatives using chain rule. All-in-all, it was rather hand-wavey. However, I'm not sure exactly how this works, intuitively.

Why, intuitively, is the equation above true? Why addition? Why not multiplication, like the other chain rule? Why are some multiplied and some added?

Best Answer

The basic reason is that one is simply composing the derivatives just as one composes the functions. Derivatives are linear approximations to functions. When you compose the functions, you compose the linear approximations---not a surprise.

I'm going to try to expand on Harry Gindi's answer, because that was the only way I could grok it, but in somewhat simpler terms. The way to think of a derivative in multiple variables is as a linear approximation. In particular, let $f: R^m \to R^n$ and $q=f(p)$. Then near $p$, we can write $f$ as $q$ basically something linear plus some "noise" which "doesn't matter" (i.e. is little oh of the distance to $p$). Call this linear map $L: R^m \to R^n$.

Now, suppose $g: R^n \to R^s$ is some map and $r = g(q)$. We can approximate $g$ near $q$ by $r$ plus some linear map $N$ plus some "garbage" which is, again, small.

For simplicity, I'm going to assume that $p,q,r$ are all zero. This is ok, because one can just move one's origin around a bit.

So, as before, applying $f$ to a point near zero corresponds loosely to applying the linear transformation $L$. Applying $g$ to a point near zero corresponds loosely to applying $N$. Hence applying $g \circ f$ corresponds up to some ignorable "garbage" to the map $N \circ L$.

This means that $N \circ L$ is the linear approximation to $g \circ f$ at zero, in particular this composition is the derivative of $g \circ f$.

Related Question