The problem with intuition about cancelling differentials, it isn't safe. And yet, the method of differentials is stupidly successful.
Let me give a standard example of intuitions downfall. First, since partials cancel,
$$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = 1$$
except, it doesn't. Actually, with the right interpretation,
$$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = -1.$$
In particular, we assume $x,y,z$ are related by some level function $F(x,y,z)=0$ then $dF = F_xdx+F_ydy+F_zdz$ thus
$$ \frac{\partial z}{\partial y} = \frac{dz}{dy}\bigg{|}_{dx=0} = -\frac{F_y}{F_z}$$
with more words, if we consider $z$ as a function of $x,y$ then the partial derivative of $z$ whilst holding $x$ fixed is $-F_y/F_z$. Notice, I simply take the total differential of $F$ and solve for $dz/dy$ while setting $dx=0$. This is an example of how the differential notation is naively successful (because, careful application of the implicit function theorem yields the same outcome). Likewise, intuitive calculation with $dx,dy,dz$ yields
$$ \frac{\partial y}{\partial x} = \frac{dy}{dx}\bigg{|}_{dz=0} = -\frac{F_x}{F_y}$$
$$ \frac{\partial x}{\partial z} = \frac{dx}{dz}\bigg{|}_{dy=0} = -\frac{F_z}{F_x}$$
Thus,
$$ \frac{\partial z}{\partial y}\frac{\partial y}{\partial x}\frac{\partial x}{\partial z} = \left(-\frac{F_y}{F_z}\right)\left(-\frac{F_x}{F_y}\right)\left(-\frac{F_z}{F_x}\right) = -1.$$
Getting back to your posed question. Why are there sums of derivatives? Well, in short, because the multivariate function can change in all of its arguments. As the derivative is a linear approximation to the change in the function we have little hope except to see formulas formed from sums of all the possible things which can change the outcome. This is the multivariate chain rule. It accounts for each entry in an entirely symmetrical manner. Ok, these sort of explainations don't settle well with me. The real answer in my estimation is matrix multiplication. The chain-rules really fall out of multiplication of Jacobian matrices which in turn come from the chain-rule in its pure form $D(F \circ G) = DF \circ DG$. But, perhaps this isn't intuition. That said, it is my intuition.
I'll add a little example to explain how the matrix multiplication works together with the Jacobian matrix to capture the chain rule. Suppose $\vec{X}: \mathbb{R}^2_{uv} \rightarrow \mathbb{R}^3_{xyz}$ and $\vec{F} = \langle P, Q, R \rangle : \mathbb{R}^3_{xyz} \rightarrow \mathbb{R}^3$. Here I use the notation $\mathbb{R}^2_{uv}$ to indicate $u,v$ serve as the coordinates. Here you can think of $\vec{X}$ as a parametrization of a surface and $\vec{F}$ as a vector field in three dimensional space. The composition $\vec{F} \circ \vec{X}$ is commonly considered in the calculation of flux of $\vec{F}$ through the surface parametrized by $\vec{X}$. In this case, the Jacobian of $\vec{X}$ is given by
$$ J_{\vec{X}} = \left[ \frac{\partial \vec{X}}{\partial u} |\frac{\partial \vec{X}}{\partial v}\right] = \left[\begin{array}{cc} \partial_u x & \partial_v x \\
\partial_u y & \partial_v y \\
\partial_u z & \partial_v z \end{array} \right]$$
and the Jacobian of $\vec{F}$ is given by
$$ J_{\vec{F}} = \left[ \frac{\partial \vec{F}}{\partial x}|
\frac{\partial \vec{F}}{\partial y}|
\frac{\partial \vec{F}}{\partial z} \right] = \left[
\begin{array}{ccc}
\partial_x P & \partial_y P & \partial_z P \\
\partial_x Q & \partial_y Q & \partial_z Q \\
\partial_x R & \partial_y R & \partial_z R \\
\end{array} \right]$$
Setting $\vec{G} = \vec{F} \circ \vec{X}$ we find from the matrix form of the chain rule that: (suppressing point dependence)
\begin{align} J_{\vec{G}} &= J_{\vec{F}}J_{\vec{X}} \\
&= \left[
\begin{array}{ccc}
\partial_x P & \partial_y P & \partial_z P \\
\partial_x Q & \partial_y Q & \partial_z Q \\
\partial_x R & \partial_y R & \partial_z R \\
\end{array} \right]\left[\begin{array}{cc} \partial_u x & \partial_v x \\
\partial_u y & \partial_v y \\
\partial_u z & \partial_v z \end{array} \right] \\
&=
\left[\begin{array}{c|c}
\partial_x P\partial_u x +\partial_y P \partial_u y + \partial_z P\partial_u z
&\partial_x P\partial_v x +\partial_y P \partial_v y + \partial_z P\partial_v z \\
\partial_x Q\partial_u x +\partial_y Q \partial_u y + \partial_z Q\partial_u z
&\partial_x Q\partial_v x +\partial_y Q \partial_v y + \partial_z Q\partial_v z \\
\partial_x R\partial_u x +\partial_y R \partial_u y + \partial_z R\partial_u z
&\partial_x R\partial_v x +\partial_y R \partial_v y + \partial_z R\partial_v z
\end{array} \right]
\end{align}
For example, in the $(1,1)$ entry we read off:
$$ \frac{\partial G^1}{\partial u} = \frac{\partial}{\partial u} \left[P(x(u,v), y(u,v), z(u,v))\right] =
\frac{\partial P}{\partial x}\frac{\partial x}{\partial u} +
\frac{\partial P}{\partial y}\frac{\partial y}{\partial u} +
\frac{\partial P}{\partial z}\frac{\partial z}{\partial u}
$$
Notice the matrix $J_{\vec{G}}$ contains all $6$ interesting chain rules involving composition of the component functions $P,Q,R$ of $\vec{F}$ composed with the component functions $x,y,z$ of $u,v$.
Presumably we are saying that $f$ is a function of $x$ and $y$ (i.e., $f(x, y)$), which are both functions of $t\ \ $ ($x(t)$ and $y(t)$). So what does it mean to write $df/dt$? This is really the derivative of another function $F$ defined by
$$F(t) = f(x(t), y(t)).$$
Define the function $g$ by $g(t) = (x(t), y(t))$ so that $F(t) = f(g(t)) = f \circ g(t)$.
Recall the multivariable chain rule.
Theorem (Multivariable Chain Rule). Suppose $g\colon \mathbf{R}^n \to \mathbf{R}^m$ is differentiable at $a \in \mathbf{R}^n$ and $f\colon \mathbf{R}^m \to \mathbf{R}^p$ is differentiable at $g(a) \in \mathbf{R}^m$. Then $f \circ g\colon \mathbf{R}^n \to \mathbf{R}^p$ is differentiable at $a$, and its derivative at this point is given by
$$D_a(f \circ g) = D_{g(a)}(f) \ D_a(g).$$
You can find a proof of this in, e.g., Calculus on Manifolds (Spivak). Back to the problem at hand: how do we use the chain rule to prove that
$$\frac{df}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt}?$$
Well, let's try writing this in terms of a "matrix" product,
$$\frac{df}{dt} = \begin{bmatrix}\dfrac{\partial f}{\partial x} & \dfrac{\partial f}{\partial y}\end{bmatrix}\begin{pmatrix}dx/dt\\dy/dt\end{pmatrix}.$$
But this is exactly what the chain rule states when applied to the function $F = f \circ g$. We have that
- $D_a(f \circ g) = D_a(F) = \dfrac{dF}{dt}$ (evaluated at some point $a$)
- $D_{g(a)}(f) = \begin{bmatrix}\dfrac{\partial f}{\partial x} & \dfrac{\partial f}{\partial y}\end{bmatrix}$ (each term evaluated at $g(a)$)
- $D_a(g) = \displaystyle \begin{pmatrix}dx/dt\\dy/dt\end{pmatrix}$ (each term evaluated at $a$)
where we have assumed differentiability of the maps.
Best Answer
Nobody should use the "fraction" approach.
To provide intuition I tend to fall back on linear approximation.
If we write $$f(x+\epsilon)\approx f(x)+ f'(x)\epsilon$$
Then $$f\circ g(x+\epsilon)\approx f\circ g(x)+(f\circ g)'(x)\epsilon$$
But we could also write $$f\circ g(x+\epsilon)\approx f(g(x)+ g'(x)\epsilon)\approx f\circ g (x)+f'(g(x))g'(x)\epsilon$$
And comparing the two shows that $$(f\circ g)'(x)=f'(g(x))g'(x)$$ as desired.
Of course, to make this rigorous one has to argue that the coefficient in the linear approximation is uniquely defined and so on, but students ought to be aware that this interpretation of derivatives is an important tool in numerical analysis and the chain rule drops out of it.