I have always disliked the definition of differentiable given in introductory multivariable calculus texts. Wikipedia has a much nicer definition which I will try to spell out.
The derivative is not as easily visualized in higher dimensions. However, the idea is the same. The tangent line at a point $x$ is the line that best approximates the function at $x$. This idea of linear (or really, affine) approximation carries over to higher dimensions.
You're familiar with the usual definition of a 1-variable derivative: $f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}$. It might not be clear at first how to generalize this to multivariable functions, but hopefully it will be after we rearrange the above equation:
$$
f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} \iff 0 = \lim_{h \to 0} \frac{f(x + h) - f(x) - f'(x)h}{h} \, .
$$
Note that for a fixed $x$, the function $L(h) = f'(x) h$ is just a line through the origin with slope $f'(x)$, which is an example of a linear map in the sense of linear algebra. In general, the derivative of a function $f : \mathbb{R}^m \to \mathbb{R}^n$ at a point ${x} \in \mathbb{R}^m$ is defined to be a linear map $Df_{x} : \mathbb{R}^m \to \mathbb{R}^n$ such that
$$
\lim_{h \to 0} \frac{f(x + h) - f(x) - Df_x(h)}{\|h\|} = 0
$$
where $\|h\|$ is the length of the vector $h \in \mathbb{R}^m$.
One can show that such a linear map is unique if it exists. One can also show that this linear map $Df_x$ can be represented as left multiplication by the Jacobian matrix
$$
[Df_x] =
\begin{pmatrix}
\left. \frac{\partial F_1}{\partial x_1} \right|_x & \cdots & \left. \frac{\partial F_1}{\partial x_m} \right|_x\\
\vdots & & \vdots\\
\left. \frac{\partial F_n}{\partial x_1} \right|_x & \cdots & \left. \frac{\partial F_n}{\partial x_m} \right|_x
\end{pmatrix}
$$
where the $F_i : \mathbb{R}^m \to \mathbb{R}$ are the component functions of $F$, i.e., $F(x) = (F_1(x), \ldots, F_n(x))$.
Okay, after all those abstract definitions, let's consider your particular example. For a function $f : \mathbb{R}^2 \to \mathbb{R}$, the derivative can indeed be visualized as the tangent plane to the graph of $f$. Writing $z = f(x,y)$ or $0 = f(x,y) - z$, then points on the graph of $f$ are of the form $(x,y,z) = (x,y,f(x,y))$. In this case, $Df_{(x,y)}$ is simply the gradient $\nabla f|_{(x,y)} = \left(\left.\frac{\partial f}{\partial x}\right|_{(x,y)}, \left.\frac{\partial f}{\partial y}\right|_{(x,y)}\right)$. Letting $g(x,y,z) = f(x,y) - z$, then $\nabla g = ([Df_{(x,y)}],-1) = \left(\left.\frac{\partial f}{\partial x}\right|_{(x,y)}, \left.\frac{\partial f}{\partial y}\right|_{(x,y)}, -1\right)$. This defines a vector that is orthogonal to the graph of $f$ and is the normal vector to the tangent plane at the point $(x,y)$. Thus, from this very abstract definition of a derivative given above, we recover the intuitive idea that the tangent plane should represent the derivative.
For instance, suppose we have the function $f(x,y) = x^2 + y^2$ and we'd like to find its derivative and tangent plane at the point $(2,-3)$. We compute the gradient $\nabla f = (2x, 2y)$, so $\left. \nabla f \right|_{(2,-3)} = (4, -6)$. Note that $f(2,-3) = 13$. Letting $g(x,y,z) = f(x,y) - z$ as above, then $\left.\nabla g\right|_{(2,-3,13)} = (4, -6, -1)$ This is the normal vector of the tangent plane of the graph of $f$ at $(2,-3)$, which we compute as
$$
4x - 6y - z = (4, -6, -1) \cdot (x,y,z) = (4, -6,-1) \cdot (2, -3,13) = 8 + 18 - 13 = 13
$$
so the tangent plane is given by $z = 4x - 6y -13$.
For more background, I recommend Apostol's Mathematical Analysis.
There are functions for which all directional derivatives exist and are still not differentiable. A web search will turn up several examples such as this one, in which not only do they all exist but are equal. It fails to be differentiable because all paths need to be considered, not just the straight lines. Requiring continuity as well likely does the trick, but to me that seems like using a bigger and bigger hammer to deal with each issue as it arises.
I think the awkwardness in the text’s definition comes from treating $x$ and $y$ separately. If you formulate the definition of differentiability in terms of vectors, it doesn’t seem all that awkward to me: A function $f:\mathbb R^m\to\mathbb R^n$ is differentiable at $\mathbf v\in\mathbb R^m$ if there exists a linear map $L_{\mathbf v}:\mathbb R^m\to\mathbb R^n$ such that $f(\mathbf v+\mathbf h) = f(\mathbf v)+L_{\mathbf v}(\mathbf h)+o(\mathbf h)$. If such a map exists, it isn’t hard to show that it is unique. This linear map is the differential of $f$ at $\mathbf v$, often denoted by $\mathrm df_{\mathbf v}$ or simply $\mathrm df$. This captures the essential notion that a derivative/differential is the best linear approximation to the the change in function’s value near a given point.
Note that when $f:\mathbb R\to\mathbb R$ this definition reduces to that of the derivative from elementary calculus (in one dimension, a linear map is just multiplication by a scalar). In addition, gradient, directional derivative, &c can all be defined in terms of this differential.
Best Answer
Let $A = be^{ab}$ and $B = ae^{ab}$. So we write $f(a + \Delta x, b + \Delta y) - f(a,b) - A \Delta x - B \Delta y = e^{(a+ \Delta x)(b + \Delta y)} - e^{ab} - \Delta x (be^{ab}) - \Delta y (ae^{ab})$.
By continuity, $e^{(a + \Delta x)(b + \Delta y)} \to e^{ab}$ as $\Delta x, \Delta y \to 0$. Also, $\Delta x(be^{ab}) - \Delta y(ae^{ab}) \to 0$ as $\Delta x, \Delta y \to 0$.
Since the right hand side tends to $0$, this establishes differentiability as per your definition.