Presumably we are saying that $f$ is a function of $x$ and $y$ (i.e., $f(x, y)$), which are both functions of $t\ \ $ ($x(t)$ and $y(t)$). So what does it mean to write $df/dt$? This is really the derivative of another function $F$ defined by
$$F(t) = f(x(t), y(t)).$$
Define the function $g$ by $g(t) = (x(t), y(t))$ so that $F(t) = f(g(t)) = f \circ g(t)$.
Recall the multivariable chain rule.
Theorem (Multivariable Chain Rule). Suppose $g\colon \mathbf{R}^n \to \mathbf{R}^m$ is differentiable at $a \in \mathbf{R}^n$ and $f\colon \mathbf{R}^m \to \mathbf{R}^p$ is differentiable at $g(a) \in \mathbf{R}^m$. Then $f \circ g\colon \mathbf{R}^n \to \mathbf{R}^p$ is differentiable at $a$, and its derivative at this point is given by
$$D_a(f \circ g) = D_{g(a)}(f) \ D_a(g).$$
You can find a proof of this in, e.g., Calculus on Manifolds (Spivak). Back to the problem at hand: how do we use the chain rule to prove that
$$\frac{df}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt}?$$
Well, let's try writing this in terms of a "matrix" product,
$$\frac{df}{dt} = \begin{bmatrix}\dfrac{\partial f}{\partial x} & \dfrac{\partial f}{\partial y}\end{bmatrix}\begin{pmatrix}dx/dt\\dy/dt\end{pmatrix}.$$
But this is exactly what the chain rule states when applied to the function $F = f \circ g$. We have that
- $D_a(f \circ g) = D_a(F) = \dfrac{dF}{dt}$ (evaluated at some point $a$)
- $D_{g(a)}(f) = \begin{bmatrix}\dfrac{\partial f}{\partial x} & \dfrac{\partial f}{\partial y}\end{bmatrix}$ (each term evaluated at $g(a)$)
- $D_a(g) = \displaystyle \begin{pmatrix}dx/dt\\dy/dt\end{pmatrix}$ (each term evaluated at $a$)
where we have assumed differentiability of the maps.
These days I've been looking for a rigurous proof of the multivariable chain rule and I've finally found one that I think is very easy to understand. I will leave it here (if nobody minds) for anybody searching for this that is not familiar with little-o notation, Jacobians and stuff like this. To understand this proof, all you need to know is the mean value theorem.
Let's say we have a function $f(x,y)$ and $x = x(t), y = y(t)$. Let's also take $z(t) = f(x(t), y(t))$ By definition, the derivative of z $z'(t)$ is
$$ z'(t) = \lim_{\Delta t \to 0}{\frac {f(x(t+\Delta t),y(t+\Delta t)) - f(x,y)}{\Delta t}}$$.
$$ Let \ \Delta x = x(t+\Delta t)-x(t),$$ $$\Delta y = y(t+\Delta t)-y(t)$$
Now I'll take the numerator of the fraction in the limit, and make a small change.
$$ f(x(t+\Delta t), y(t+\Delta t)) - f(x,y) = f(x+\Delta x, y+\Delta y) - f(x,y)$$
$$ = \left[f(x+\Delta x, y+\Delta y) - f(x+\Delta x, y)\right] + \left[f(x+\Delta x, y) - f(x, y)\right]$$
I have just added and substracted $f(x+\Delta x, y)$. For some reason, I will invert the terms.
$$ = \left[f(x+\Delta x, y) - f(x, y)\right] + \left[f(x+\Delta x, y+\Delta y) - f(x+\Delta x, y)\right]$$.
Now, let's define 2 functions and I will name them g and h. First,
$$ Let \ g(x) = f(x, y) \implies g'(x) = \frac {\partial f} {\partial x} $$.
Please note that y is constant here since g is a function of a single variable. Now, by the mean value theorem we have
$$ \exists c_1 \in (x, x+\Delta x) \ so \ that$$
$$\frac {g(x+\Delta x) - g(x)} {\Delta x} = g'(c_1) $$
$$ \Longleftrightarrow $$
$$ f(x+\Delta x, y) - f(x, y) = f_x(c_1, y)\Delta x$$
Similarly, using the function
$$ h(y) = f(x + \Delta x, y) \implies h'(y) = \frac {\partial} {\partial y}f(x+\Delta x, y)$$
We will have by the same logic that
$$ f(x+\Delta x, y + \Delta y) - f(x+\Delta x, y) =
f_y(x + \Delta x, c_2)\Delta y, c_2 \in (y, y+\Delta y) $$
Notice that $c_1$ and $c_2$ are bounded with respect to $\Delta x$ and $\Delta y$
So as $\Delta x \to 0, c_1 \to x$ and as $\Delta y \to 0, c_2 \to y$. By our definition of $\Delta x$ and $\Delta y$, as $\Delta t \to 0$, both $\Delta x$ and $\Delta y$ $\to 0$. So, as $\Delta t \to 0$, $c_1 \to x$ and $c_2 \to y$.
The last step of the proof is to sum this all up, divide by $\Delta t$ and take the limit as $\Delta t \to 0$
$$ f(x(t+\Delta t), y(t+\Delta t)) - f(x, y) = f_x(c_1, y)\Delta x + f_y(x+\Delta x, c_2)\Delta y $$
$$ \lim_{\Delta t \to 0} \frac {f(x(t+\Delta t), y(t+\Delta t))}{\Delta t} = \lim_{\Delta t \to 0} f_x(c_1, y)\frac {\Delta x}{\Delta t} + f_y(x+\Delta x, c_2)\frac {\Delta y}{\Delta t} = f_x(x, y)x'(t) + f_y(x, y)y'(t) \ QED $$
Edit: After a long time I've realised that this proof assumes that $f$ has partial derivatives defined on intervals around the point $(x, y)$ and they are continuous at the point. This is a sufficient condition for the function to be ($\mathbb{R}^2$-)differentiable at $(x, y)$, but it's not equivalent. Yet, the multivariable chain rule works for the function being just differentiable at that point. So for a general proof, one should first understand little-o notation as in the other answers.
Best Answer
The derivative of $f\colon \mathbb R\to\mathbb R$ at $x$ is defined as $$ \lim_{a\to x} \frac{f(x)-f(a)}{x-a}. $$ The expression $\frac{f(x)-f(a)}{x-a}$ is defined whenever $x\neq a$. Here $a$ may be arbitrarily close to $x$ but as long as it doesn't equal $x$, the expression is well defined.
For the chain rule, you want to rewrite the expression $$ \frac{f(g(x))-f(g(a))}{x-a} $$ as $$ \frac{f(g(x))-f(g(a))}{g(x)-g(a)} \cdot \frac{g(x)-g(a)}{x-a}. $$ For this to be a well defined expression, you still need $x\neq a$ but you also need $g(x)\neq g(a)$. This is a problem since we might have $g(x)=g(a)$ even though $x\neq a$.
For example consider the differentiable function $$ g(x) = \begin{cases} (x+1)^2 & \text{if $x\le -1$}, \\ 0 & \text{if $-1\le x\le 1$}, \\ (x-1)^2 & \text{if $1\le x$}. \end{cases} $$ When you now want to take the derivative of $f(g(x))$ for some $f$ at $x=0$, you are running into the problem that $g(a)=0=g(0)$ whenever $-1\le a\le 1$, so you can't divide by $g(x)-g(a)$ for $a$ close to $x$, even though you only consider $a\neq x$.