I saw this proof of the chain rule but it says this is a flawed proof. Why? I guessed the reason it is wrong because you can't substitute $g(x+h)$ and $g(x)$ into in limit.
[Math] Why is this proof of the chain rule incorrect
calculuschain rulederivativesproof-verification
Related Solutions
Instead of putting so many long comments I thought it would be better to make a small answer. We start with a point $a$ and let us keep notation $(f\circ g)(x) = f(g(x))$. Then we have $$(f\circ g)'(a) = \lim_{x \to a}\frac{f(g(x)) - f(g(a))}{x - a}$$ We are given that $$\lim_{y \to g(a)}\frac{f(y) - f(g(a))}{y - g(a)} = f'(g(a)) = A, \lim_{x \to a}\frac{g(x) - g(a)}{x - a} = g'(a) = B$$ We need to show that $(f\circ g)'(a) = AB$. Clearly we need to distinguish two cases:
1) There is a neighborhood of $a$ in which $g(x) \neq g(a)$ when $x \neq a$. In this case we have $$(f\circ g)'(a) = \lim_{x \to a}\frac{f(g(x)) - f(g(a))}{x - a} = \lim_{x \to a}\frac{f(g(x)) - f(g(a))}{g(x) - g(a)}\cdot\frac{g(x) - g(a)}{x - a} = AB$$
2) Every neighborhood of $a$ contains infinitely many points $x\neq a$ such that $g(x) = g(a)$. Hence it is possible to find a sequence $x_{n} \to a$ such that $x_{n} \neq a$ and $g(x_{n}) = g(a)$. Now limit $B$ exists and hence $$B = \lim_{x \to a}\frac{g(x) - g(a)}{x - a} = \lim_{n \to \infty}\frac{g(x_{n}) - g(a)}{x_{n} - a} = 0$$ Now we need to show that $(f\circ g)'(a) = AB = 0$. Consider the ratio $$F(x, a) = \frac{f(g(x)) - f(g(a))}{x - a}$$ if $g(x) = g(a)$ then $F(x, a) = 0$. If $g(x) \neq g(a)$ then we can write $$F(x, a) = \frac{f(g(x)) - f(g(a))}{g(x) - g(a)}\cdot\frac{g(x) - g(a)}{x - a}$$ and the first factor is near $A$ and second factor is near $B = 0$. So effectively if we have a sufficiently small neighborhood of $a$ then we either have $F(x, a) = 0$ or $F(x, a)$ is very small. Using $\epsilon, \delta$ argument we can show that for any $\epsilon > 0$ there is a $\delta > 0$ such that $|F(x, a)| < \epsilon$ whenever $0 < |x - a| < \delta$. This shows that $\lim_{x \to a}F(x, a) = 0$ i.e. $(f\circ g)'(a) = 0$ as was to shown.
Differentials are indeed not numbers*. Basically, they represent a change in a quantity which has "already gone to zero". Thus they really only carry meaning in a ratio, where you can have a ratio of two quantities which are both going to zero, yet the ratio can have a finite, nonzero limit.
To prove the chain rule rigorously, you should actually consider two distinct cases. First you should assume $f$ is differentiable at $g(x)$, $g$ is differentiable at $x$, and $g'(x) \neq 0$. Then you can do something which actually looks like the "cancellation of differentials". Specifically you can write:
$$\frac{f(g(x+h))-f(g(x))}{h}=\frac{f(g(x+h))-f(g(x))}{g(x+h)-g(x)} \frac{g(x+h)-g(x)}{h}.$$
Now you split into two limits and send $h \to 0$. Because of the additional assumption that $g'(x) \neq 0$, we can actually compute the limit of the first factor, and everything turns out to be OK. (It is not completely automatic, but it is quite doable.)
Second you should separately should prove that if instead $g'(x)=0$, then $(f \circ g)'(x)=0$.
What your book is doing in these problems where differentials are replaced by small finite numbers is approximate. Specifically, part of the point of the derivative is that $\Delta y \approx \frac{dy}{dx} \Delta x$ provided $\Delta x$ is sufficiently small. But this is approximate: we do not have anything like "$dy=\Delta y$". And indeed, we should use "$\approx$" not "$=$", in these problems.
* There are ways to properly handle these infinitesimal quantities as separate entities. Collectively these ways are called nonstandard analysis. For various reasons, none of these approaches are popular.
Best Answer
To expand on my comment, the fundamental issue is that $g(x+h) - g(x)$ may vanish in any neighbourhood around $h=0$. The issue of $g'(x)$ being $0$ (though certainly a mistake in the proof) is not that important, since this "proof" can be trivially modified so that the $g'(x)$ term stays on the right hand side. For example,
$$\frac{f(g(x+h))-f(g(x))}{h} = \frac{f(g(x+h))-f(g(x))}{g(x+h) - g(x)} \cdot \frac{g(x+h) - g(x)}{h} \ \ \ \ \ \ \ \ \ (1)$$
and let $h \to 0$ in the equation. The problem, again, is that we may have $g(x+h) = g(x)$ in every neighbourhood of $x$ for certain badly behaved functions (ex. $g(t) = t^2\sin \frac 1t$, $x=0$ ).
The trick (credit to Michael Spivak) is as follows. Define $$\sigma(h) = \begin{cases} f'(g(x)) & g(x+h) = g(x) \\ \ \frac{f(g(x+h))-f(g(x))}{g(x+h) - g(x)} & \text{otherwise} \end{cases}$$
and note that as $h \to 0$, this tends to $f'(g(x))$ without any division by zero problems. Now, substitute $\sigma(h)$ for the first fraction on the RHS in $(1)$ and let $h \to 0$. The substitution is justified because the equality in the modified version of $(1)$ will always hold (can you see why?).