[Math] Spivak’s Chain Rule Proof (Image of proof provided)

calculuschain rulederivativesproof-explanationreal-analysis

If $g$ is differentiable at $a$, and $f$ is differentiable at $g(a)$, then $f \circ g$ is differentiable at $a$, and
$$
(f \circ g)^{\prime}(a)=f^{\prime}(g(a)) \cdot g^{\prime}(a).
$$

Define a function $\phi$ as follows:
$$
\phi(h)= \begin{cases}\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}, & \text { if } g(a+h)-g(a) \neq 0 \\ f^{\prime}(g(a)), & \text { if } g(a+h)-g(a)=0 .\end{cases}
$$

It should be intuitively clear that $\phi$ is continuous at $0:$ When $h$ is small, $g(a+h)-g(a)$ is also small, so if $g(a+h)-g(a)$ is not zero, then $\phi(h)$ will be close to $f^{\prime}(g(a)) ;$ and if it is zero, then $\phi(h)$ actually equals $f^{\prime}(g(a))$, which is even better. Since the continuity of $\phi$ is the crux of the whole proof we will provide a careful translation of this intuitive argument.

We know that $f$ is differentiable at $g(a) .$ This means that
$$
\lim _{k \rightarrow 0} \frac{f(g(a)+k)-f(g(a))}{k}=f^{\prime}(g(a)).
$$

Thus, if $\varepsilon>0$ there is some number $\delta^{\prime}>0$ such that, for all $k$,
$$ \text{if $0<|k|<\delta^{\prime}$, then $\left|\frac{f(g(a)+k)-f(g(a))}{k}-f^{\prime}(g(a))\right|<\varepsilon$}. \tag{1} $$
Now $g$ is differentiable at $a$, hence continuous at $a$, so there is a $\delta>0$ such that, for all $h$,
$$\text{ if $|h|<\delta$, then $|g(a+h)-g(a)|<\delta^{\prime} .$}\tag{2}$$
Consider now any $h$ with $|h|<\delta .$ If $k=g(a+h)-g(a) \neq 0$, then
$$
\phi(h)=\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}=\frac{f(g(a)+k)-f(g(a))}{k} ;
$$

it follows from $(2)$ that $|k|<\delta^{\prime}$, and hence from (1) that
$$
\left|\phi(h)-f^{\prime}(g(a))\right|<\varepsilon.
$$

(transcribed from this screenshot)

Here is a proof of the chain rule in Spivak's Calculus. Note there is a second page, but I understand it, and this is the meat of the proof. I have a few questions.

$\textbf{1.}$ "It should be intuitively clear that $\phi$ is continuous at $0$." Do we care that it is continuous at zero so we will not have a division by zero since $g(a+h)-g(a)$ is in the denominator and could equal zero? I am not sure I understand why it is continuous at zero. I understand what he was saying but I was always under the impression continuity was when there were no breaks in the graph visually. Here, I am imagining $\phi(h)$ being continuous up to zero, then it jumping to another point when it is zero.

$\textbf{2.}$ At (2),I do not understand what we are trying to do here. We randomly switched to $h$ and are defining continuity I think. The switch back and forth from $k$ to $h$ is confusing me.

Best Answer

The "intuitively clear" fact is that there is no visual break in the graph of $\phi(h)$. Sure, the graph of $$ \phi_1(h) = \frac{f(g(a + h)) - f(g(a))}{g(a + h) - g(a)} $$ has a "hole" where $h = 0$, and depending on the other values of $g(a+h)$, there may be additional holes or even entire intervals of the $x$-axis that have no value of $\phi_1(h)$. (Basically, whenever $g(a+h) = g(a)$, there is no value of $\phi_1(h)$.) But the only way to approach one of those "holes" is for the graph of the function to come right up to (or down to) the horizontal line that graphs the constant function $\phi_2(h) = f'(g(a))$. Every "hole" in $\phi_1(h)$ begins and ends on that line, and the second half of the definition of $\phi$ fills in each of those holes with exactly the function value that will connect all the pieces of the graph, namely the value $f'(g(a))$.

For the second part of your question, yes, all the business with statements $(1)$ and $(2)$ is directly using the epsilon-delta definition of continuity. But it requires two application of the definition, logically connected to each other, so we can't just use the symbols $\varepsilon$ and $\delta$ both times--the "epsilon" from one application of the definition is the "delta" for the other application.

In order to keep the symbols unambiguous, the proof uses $\varepsilon$ and $\delta'$ for the "epsilon" and "delta" in statement $(1)$, and it uses $\delta'$ and $\delta$ for the "epsilon" and "delta" in statement $(2)$.

You do have to keep track of what $h$ is versus what $k$ is. I think the trickiest part is near the end, in the sentence that starts, "If $k = g(a + h) - g(a) \neq 0$". By that time we have the condition $0 < \lvert h \rvert < \delta$, which guarantees that we don't produce any $k$ that violate $0 < \lvert k \rvert < \delta'$ this way, but we don't necessarily produce every value of $k$ that would satisfy that condition (which is OK; we don't need to do that). Also, we don't necessarily use every value of $h$ such that $0 < \lvert h \rvert < \delta$: any $h$ for which $g(a + h) - g(a) = 0$ has no corresponding value of $k$; instead, it produces one of the values of $\phi(h)$ that is already at the limit we're trying to show. Yes, this is complicated, and maybe that contributes to the opinions expressed in some other answers and comments that you might prefer to look at someone else's proof.