Understanding a Detail in Proof of Chain Rule

I am currently studying from Courant's Differential and Integral Calculus, and I have been trying to understand his proof of the chain rule, which uses the idea that the derivative is the best linear approximation of the function at a point.

To give context for Courant's notation, I will write out his definition of a compound function before giving his proof of the chain rule:

Definition of compound function:

Let $\phi (x)$ be a function which is differentiable in an interval $x \in [a, b]$ and assumes all values in the interval $\phi \in [\alpha, \beta]$. Consider a second differentiable function $g(\phi)$ of the independent variable $\phi$, in which the variable $\phi$ ranges over the interval from $\alpha$ to $\beta$. We can now regard the function $g(\phi) = g \{ \phi (x) \} = f(x)$ as a function of x in the interval $x \in [a, b]$.

Proof of Chain Rule:

For any arbitrary $\Delta x \neq 0$ and corresponding values $\Delta \phi$ and $\Delta g$ there exist two quantities $\epsilon$ and $\eta$, tending to 0 with $\Delta x$, such that $$\Delta g = g^{'}(\phi)\Delta \phi + \epsilon\Delta\phi, \\ \Delta \phi = \phi^{'}(x)\Delta x + \eta\Delta x;$$
we have only to calculate $\eta$ from the second equation and, where $\Delta \phi \neq 0$, $\epsilon$ from the first equation, while if $\Delta \phi$ = 0, we put $\epsilon = 0$. If in the first of these equations we now substitute the value of $\Delta \phi$ from the second equation, we obtain $$\Delta g = g^{'}(\phi)\phi^{'}(x)\Delta x + \{\eta g^{'}(\phi) + \epsilon\phi^{'}(x) + \epsilon\eta\}\Delta x$$
which can be rewritten as $$\frac {\Delta g}{\Delta x} = g^{'}(\phi)\phi^{'}(x) + \{\eta g^{'}(\phi) + \epsilon\phi^{'}(x) + \epsilon\eta\}.$$
In this equation, we can let $\Delta x$ tend to $0$, and the bracket on the right tends to zero with $\Delta x$. The left hand side of our equation has a limit $f^{'}(x)$, and this limit is equal to the first term on the right hand side: $$f^{'}(x)=g^{'}(\phi)\phi^{'}(x)$$

The part I am confused about in Courant's proof is why he goes out of his way to state:

we have only to calculate $\eta$ from the second equation and, where $\Delta \phi \neq 0$, $\epsilon$ from the first equation, while if $\Delta \phi$ = 0, we put $\epsilon = 0$

Why do we have to set $\epsilon$ to equal $0$ when $\Delta \phi$ is $0$? Why can't we set $\epsilon$ to equal to 1? From my naive perspective, I can't see an issue of setting $\epsilon$ to 1 when $\Delta \phi$ is $0$.

From what I understand, the entire proof is only based on the condition that $g(\phi)$ and $\phi(x)$ are differentiable, which is what allows us to write the expressions for $\Delta g$ and $\Delta \phi$ and let us claim that there exist two quantities $\epsilon$ and $\eta$ that tend to $0$ with $\Delta x$.

Is there something I am misunderstanding about Courant's proof or is my interpretation of his proof correct?

Best Answer

The problem is that both of the original equations hold for $\Delta\phi$ and $\Delta x$ nonzero but tending to $0$. Their first sentence is actually incorrect: We know that $\epsilon$ tends to $0$ with $\Delta\phi$, not with $\Delta x$. But as $\Delta x\to 0$, it may happen that $\Delta\phi$ is not just small but actually equals $0$ for infinitely many (small) values of $\Delta x$. In this case, if you set $\epsilon = 1$ when $\Delta\phi = 0$, note that you get the extra term $\epsilon \phi'(x) = \phi'(x)$ in the final formula for the derivative of the composition. Basically, you want a formula that is continuous for small and zero values of $\Delta\phi$.

Best Answer

Related Solutions

[Math] Help understanding Rudin’s proof of the chain rule

[Math] Chain rule proof doubt

Related Question