I have written a somewhat eccentric course on this, which can be found here:
http://ximera.osu.edu/course/kisonecat/m2o2c2/course/activity/week1/
A short answer to your question, though, is that if $f:\mathbb{R}^n \to \mathbb{R}$, then $D^{k+1} f$ is a locally a $(k+1)$-linear function with the property that
$$
D^{k}f\big|_{p+v_{k+1}}(v_1,v_2,...,v_k) \approx D^{k}f\big|_{p}(v_1,v_2,...,v_k) +D^{k+1}\big|_{p}(v_1,v_2,...,v_k,v_{k+1})
$$
In other words, the $(k+1)$ derivative measures changes in the $k$ derivative.
To write things out in a basis, in Einstein notation, we have
$$D^k f = f_{i_1i_2...i_k} dx^{i_1} \otimes dx^{i_2} \otimes ... \otimes dx^{i_k}$$
where $f_{i_1i_2...i_k}$ is the higher partial derivative of $f$ with respect to $x_{i_1}$ then $x_{i_2}$, etc.
I should note that the multivariable Taylor's theorem becomes especially easy to write down using this formalism:
$$
f(x+h) = f(x)+Df\big|_x (h)+\frac{1}{2!} D^2 f\big|_x (h,h)+\frac{1}{3!} D^3 f\big|_x (h,h,h)+...
$$
This may also illuminate the presence of $\frac{1}{k!}$ in Taylor's theorem: it arises from the $k!$ permutations of the arguments.
The value of the directional derivative in its direction of greatest increase (i.e., as you said, along the direction of the gradient) is just that. It is the rate of change of the function along that direction. This answers the question, 'if I move a tiny bit $\epsilon$ in the direction of greatest increase, how does does the function change?' The answer is it changes by (epsilon)*(the derivative in the direction of greatest increase).
It seems something is confusing you so perhaps a quick recap of the directional derivative is in order. Say to avoid clutter we're considering the derivative at the origin. Then we have the first order Taylor expansion $$ f(\mathbf x) \approx f(\mathbf 0) + \nabla f(\mathbf 0)\cdot\mathbf x$$ where $\nabla f$ is the gradient. Recall that the directional derivative of $f$ in the direction of a unit vector $\hat{\mathbf u}$ is $D_{\hat{\mathbf u}}f = \nabla f\cdot\hat{\mathbf u}.$ Thus if we move a small amount $\epsilon $ in the direction $\hat{\mathbf u}$ from the origin then $\mathbf x = \epsilon\hat{\mathbf u}$ and we have $$ f(\mathbf x)-f(\mathbf 0) = \nabla f\cdot(\epsilon\hat{\mathbf u}) = \epsilon D_{\hat{\mathbf u}}f(0).$$
Why is the gradient the direction of greatest derivative? Well, the derivative is the dot product of the gradient with the direction... of course that's largest when the direction is parallel to the gradient.
And what is the value of the derivative at in this direction? It's just the gradient dotted into the unit vector in the same direction, so it's the magnitude of the gradient. Going a little more explicitly, the direction of maximum increase is $\hat{\mathbf u}_{max} = \frac{\nabla f}{|\nabla f|}$ so we have a maximal derivative $$D_{max} = \nabla f\cdot \hat{\mathbf u}_{max} = \frac{\nabla f\cdot \nabla f}{|\nabla f|} = |\nabla f|. $$ So the magnitude of the gradient represents the rate of change in the direction of greatest increase. And as discussed before, the direction of the gradient is the direction of greatest increase. Thus, both the magnitude and direction have nice interpretations.
Best Answer
On a single-variable real number line, you actually have two directions. Given a function $f$ with a positive derivative at $x_0,$ if you increase $x$ very slightly from $x_0$ then you increase $f(x),$ but if you decrease $x$ then you decrease $f(x).$ An increase is a positive change, a decrease is a negative change, and non-zero positive is never equal to negative.
So why do we say there is only one derivative of $f$ at $x_0$ when we can change $f(x)$ in two different ways? It's because the way we measure the rate of change makes them the same rate. For example, with the function $f(x) = 2x$ (choosing a linear function so we don't have to worry about the "limit" so much), if the change in $x$ is $+1$ (increasing) then the change in $f(x)$ is $+2,$ whereas if the change in $x$ is $-1$ (decreasing) then the change in $f(x)$ is $-2.$ But the rate in each case is $$ \frac{2}{1} = \frac{-2}{-1} = 2. $$
If we take the complex function $f(z) = 2z,$ if the change in $z$ is $+1$ then the change in $f(z)$ is $+2,$ if the change in $z$ is $+i$ then the change in $f(z)$ is $+2i,$ etc. No matter how we change $z,$ it turns out $f(z)$ always changes by exactly $2$ times the change in $z.$
For the function $g(z) = 2iz,$ it turns out that no matter which way $z$ changes, $g(z)$ changes by exactly $2i$ times as much. If we add $1$ to the real part of $z$ without changing the imaginary part, then the real part of $g(z)$ doesn't change at all, but the imaginary part increases by $2.$ If we add $1$ to the imaginary part of $z$ without changing the real part, then the imaginary part of $g(z)$ doesn't change at all, but the real part decreases by $2,$ because $2i \cdot i = -2.$
In your example of a function $f: \mathbb R^2 \to \mathbb R^2,$ you actually set up two independent real-valued functions on $\mathbb R^2,$ each of which has to be independently differentiable in order for $f$ to be differentiable. But when we have a function $f: \mathbb C \to \mathbb C,$ only the single complex output value $f(z)$ has to be differentiable. A "change by $dz$" in any "direction" will result in the same rate of change $\frac{df}{dz}$; it's just that if $\frac{df}{dz}\neq 0,$ some "directions" of $dz$ multiplied by $\frac{df}{dz}$ will produce more change to apply to the real part of $f$, and some will produce more change to apply to the imaginary part of $f.$
It turns out that differentiable functions from $\mathbb R^2$ to $\mathbb R^2$ don't generally correspond to differentiable functions from $\mathbb C$ to $\mathbb C,$ because the real and imaginary parts of the output of a differentiable complex function cannot change independently; their changes have to be coordinated in order to have only one rate of change at any given point. So complex differentiability is a stricter condition, viewed in that way; but it is a condition that plenty of functions can meet.