The function $f:\mathbb{R^2} \mapsto \mathbb{R}$ has a total derivative at a point $x$ if there exists a linear operator $Df(x)(\cdot)$ such that for every $\epsilon >0$ there is a $\delta > 0$ such that if $0 < ||h|| < \delta$, then
$$|f(x+h) -f(h) - Df(x)(h)| < \epsilon||h||.$$
Define the operator as
$$Df(x)(h) = \partial_1f(x_1,x_2)h_1+\partial_2f(x_1,x_2)h_2$$
Now consider the following path from $x = (x_1,x_2)$ to $x+h =(x_1+h_1,x_2+h_2)$:
$$ (x_1,x_2) \rightarrow(x_1+h_1,x_2) \rightarrow(x_1+h_1,x_2+h_2).$$
Using the mean value theorem,
$$|f(x_1+h_1,x_2+h_2) - f(x_1,x_2)- \partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2| \\
=|f(x_1+h_1,x_2+h_2) - f(x_1+h_1,x_2) +f(x_1+h_1,x_2)- f(x_1,x_2)-\partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2|
\\=|\partial_2f(x_1+h_1,\xi)h_2 + \partial_1f(\eta,x_2)h_1 -\partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2|
\\ \leq|\partial_1f(\eta,x_2)-\partial_1f(x_1,x_2)||h_1|+|\partial_2f(x_1+h_1,\xi)-\partial_2f(x_1,x_2)||h_2|$$
where $x_1 < \eta < x_1 + h_1$ and $x_2 < \xi < x_2 + h_2.$
Since partial derivatives are continuous at $x = (x_1,x_2)$, there exists $\delta >0 $ such that if $||h|| < \delta$, then
$$|\partial_1f(\eta,x_2)-\partial_1f(x_1,x_2)|< \frac{\epsilon}{\sqrt{2}},\\|\partial_2f(x_1+h_1,\xi)-\partial_2f(x_1,x_2)|< \frac{\epsilon}{\sqrt{2}}.$$
Applying Cauchy-Schwarz we get
$$|f(x_1+h_1,x_2+h_2) - f(x_1,x_2)- \partial_1f(x_1,x_2)h_1 - \partial_2f(x_1,x_2)h_2|\\< \sqrt{(\epsilon/\sqrt{2})^2+(\epsilon/\sqrt{2})^2}||h||= \epsilon ||h||.$$
It is straightforward to generalize the proof for $d > 2$.
In the usual statement of Arzela-Ascoli it does not matter, because you are dealing with a compact metric space, where "uniform equicontinuity" is equivalent to "pointwise equicontinuity". (All the functions have a common modulus of continuity at each point, and so an argument exactly parallel to the proof of Heine-Cantor furnishes a uniform modulus of continuity for all of them.) Since this is the main application of the concept, some authors define the word "equicontinuity" to mean what would be more precisely stated as "uniform equicontinuity". Others prefer the more precise term.
The examples I have in mind are Strichartz and Fitzpatrick. Strichartz uses "uniform equicontinuity" everywhere. Fitzpatrick defines "equicontinuity" in the sense of "pointwise equicontinuity" and then makes the equivalence I mentioned an exercise, rather than dwelling on it in his Arzela-Ascoli proof.
Best Answer
I think that the claim does not hold for general non-convex domains.
If you take $f$ to be the standard angle function on $E=\mathbb R^2\setminus([0,\infty)×\{0\})$, then $f:E \to (0,2\pi)$ is a counterexample.
Indeed, taking $x,a$ very close on the unit circle $\mathbb S^1$, with angles approaching zero (from above) and $2\pi$ from below, we get $f(x) - f(a) \to 2\pi$ with $x-a \to 0$. Thus, the fraction $$ \frac{|f(x) - f(a) - \nabla f (a)\cdot (x-a)|}{|x-a|} $$
explodes.