Let’s back up a bit. As Hans Ludmark points out in his comment above, the basic definition of the directional derivative in the direction specified by the unit vector $\mathbf u=(u_1,u_2)$ at a point $P=(a,b)$ is via a limit similar to the one from elementary calculus: $${\partial f\over\partial\mathbf u}(a,b)=\lim_{h\to0}{f(a+hu_1,b+hu_2)-f(a,b)\over h}.$$ As you’ve observed, this amounts to taking a vertical slice through the surface and then computing the ordinary derivative of that slice, as illustrated below.
![directional derivative](https://i.stack.imgur.com/N6qEX.png)
This derivative is, of course, the slope of the tangent line (blue) to the slice at that point. Observe that this line is also the intersection of the tangent plane at that point (grayish blue) with the cutting plane (violet), so we can interpret the directional derivative as the steepness of the tangent plane in a given direction. As you rotate the cutting plane around $P$, the slope of this line changes, reaching a maximum when the two planes are perpendicular, as we’ll see below. (You can also see that this is the case by visualizing cutting a cylinder parallel to the $z$-axis by a plane and imagining what happens to the high point as you move that plane around.)
Let’s say that the tangent plane is given by the equation $\lambda x+\mu y-z=d$ with normal $\mathbf n_t=(\lambda,\mu,-1)$. A normal to the cutting plane is $\mathbf n_c=(-u_2,u_1,0)$, which is just $\mathbf u$ rotated ninety degrees. In $\mathbb R^3$ we can find the direction of the line of intersection via a cross product: $$\mathbf n_t\times\mathbf n_c=(u_1,u_2,\lambda u_1+\mu u_2)$$ and the slope of this line is thus $${\lambda u_1+\mu u_2\over\sqrt{u_1^2+u_2^2}}=\lambda u_1+\mu u_2=(\lambda,\mu)\cdot\mathbf u=\|(\lambda,\mu)\|\cos\phi,$$ where $\phi$ is the angle between the projection of $\mathbf n_t$ onto the $x$-$y$ plane and $\mathbf u$. The slope is therefore maximal when $\phi=0$, i.e., when $\mathbf u$ and the projection of $\mathbf n_t$ point in the same direction, but this happens when the two planes are perpendicular. The maximum value of this slope is $\|(\lambda,\mu)\|$.
This is where the gradient of $f$ comes in. If we write the equation of the surface as $F(x,y,z)=f(x,y)-z=0$, then $\nabla F=(f_x,f_y,-1)$ is normal to the surface, so an equation of the tangent plane at $(a,b,f(a,b))$ is $$xf_x(a,b)+yf_y(a,b)-z=af_x(a,b)+bf_y(a,b)-f(a,b).$$ This is exactly in the form analyzed above, with $\lambda=f_x(a,b)$ and $\mu=f_y(a,b)$, so $${\partial f\over\partial\mathbf u}(a,b)=\nabla f(a,b)\cdot\mathbf u$$ with the maximal rate of change given by $\|\nabla f(a,b)\|$.
This seems awfully coincidental, but it’s not. Going back to the plane equation $\lambda x+\mu y-z=d$ above, the coefficients $\lambda$ and $\mu$ are respectively the “$x$-slope” and “$y$-slope,” i.e., the slopes of the intersections with planes parallel to the $x$- and $y$-axes. These slopes are encoded in the normal $(\lambda,\mu,-1)$. For the tangent plane, these slopes are the directional derivatives in the directions of the coordinate axes, also known as the partial derivatives of $f$.
The value of the directional derivative in its direction of greatest increase (i.e., as you said, along the direction of the gradient) is just that. It is the rate of change of the function along that direction. This answers the question, 'if I move a tiny bit $\epsilon$ in the direction of greatest increase, how does does the function change?' The answer is it changes by (epsilon)*(the derivative in the direction of greatest increase).
It seems something is confusing you so perhaps a quick recap of the directional derivative is in order. Say to avoid clutter we're considering the derivative at the origin. Then we have the first order Taylor expansion $$ f(\mathbf x) \approx f(\mathbf 0) + \nabla f(\mathbf 0)\cdot\mathbf x$$ where $\nabla f$ is the gradient. Recall that the directional derivative of $f$ in the direction of a unit vector $\hat{\mathbf u}$ is $D_{\hat{\mathbf u}}f = \nabla f\cdot\hat{\mathbf u}.$ Thus if we move a small amount $\epsilon $ in the direction $\hat{\mathbf u}$ from the origin then $\mathbf x = \epsilon\hat{\mathbf u}$ and we have $$ f(\mathbf x)-f(\mathbf 0) = \nabla f\cdot(\epsilon\hat{\mathbf u}) = \epsilon D_{\hat{\mathbf u}}f(0).$$
Why is the gradient the direction of greatest derivative? Well, the derivative is the dot product of the gradient with the direction... of course that's largest when the direction is parallel to the gradient.
And what is the value of the derivative at in this direction? It's just the gradient dotted into the unit vector in the same direction, so it's the magnitude of the gradient. Going a little more explicitly, the direction of maximum increase is $\hat{\mathbf u}_{max} = \frac{\nabla f}{|\nabla f|}$ so we have a maximal derivative $$D_{max} = \nabla f\cdot \hat{\mathbf u}_{max} = \frac{\nabla f\cdot \nabla f}{|\nabla f|} = |\nabla f|. $$ So the magnitude of the gradient represents the rate of change in the direction of greatest increase. And as discussed before, the direction of the gradient is the direction of greatest increase. Thus, both the magnitude and direction have nice interpretations.
Best Answer
You appear to be asking two questions, one about the directional derivative, the other about the dot product. Since your question appears to be mostly about the directional derivative, I will give an answer explaining the geometric meaning of the directional derivative. For ease of visualization I restrict attention to functions $\mathbb{R}^{2}\to\mathbb{R},$ but $\mathbb{R}^{2}$ can in principle be replaced by $\mathbb{R}^{n}.$
As always, a picture is worth a thousand words: if you can glean the meaning of the directional derivative from this .gif I found online then you may not need to read what I have to say. I hope the animation helps either way.
It's worth noting that the little green segment at the base of the figure in this animation is meant to represent the value of the gradient of the function at the given point.
Let $\mathbf{v}\in\mathbb{R}^{2}$ be a unit vector, considered as a vector in the $(x,y)$-plane in $\mathbb{R}^{3}.$ The vector $\mathbf{v}$ determines a unique plane $\Pi = \Pi(\mathbf{v})$ which contains the origin, the point $(0,0,1)$ and $\mathbf{v}$ itself. For example, if $\mathbf{v}=\begin{pmatrix}1\\0\end{pmatrix}$ then the plane $\Pi$ is the $y=0$ plane, i.e., the $(x,z)$-plane. (Draw a sketch to make sure you are comfortable with this idea; if you think I have explained it poorly, please ask.)
We want to consider this same plane, but "going through $\mathbf{x}$", that is, we want to consider the plane $\mathbf{x}+\Pi = \{\mathbf{x} + \mathbf{p}:\mathbf{p}\in\Pi\}.$ All we've done is taken the plane $\Pi,$ which went through the origin, and physically moved it so that the point on $\Pi$ which was previously at $(0,0,0)$ is now at $\mathbf{x}.$
Consider the directional derivative of $f\colon\mathbb{R}^{2}\to\mathbb{R}$ at $\mathbf{x}\in\mathbb{R}^{2}$ in the direction of $\mathbf{v}.$ Its value is equal to the value of the limit: $$\lim_{h\to0}\frac{f(\mathbf{x}+h\mathbf{v})-f(\mathbf{x})}{h}.$$ This can be taken as the definition of the directional derivative; in particular, the above expression turns out to be equal to $\nabla{f}\cdot\mathbf{v}.$ [Note: This equality holds only because $\mathbf{v}$ is a unit vector. If $\mathbf{v}$ is not a unit vector then we need to introduce a scaling factor to account for this.]
Consider the graph of $f,$ that is, the set $$\Gamma = \Gamma(f) = \{(x,y,z)\in\mathbb{R}^{3}: z=f(x,y)\}.$$ This is (more-or-less) a surface in $3$-space. If we look at the points where the graph of $f$ intersects $\mathbf{x}+\Pi,$ then we get a curve in space, which we'll call $\gamma = \gamma(f,\mathbf{v}) = \Gamma\cap(\mathbf{x}+\Pi).$
We can imagine the $z$-axis in $\mathbb{R}^{3},$ which is contained in $\Pi,$ together with the line $\ell = \{k\mathbf{v}:k\in\mathbb{R}\},$ defining a coordinate system on $\Pi$: the vertical axis is given by the $z$-axis, and the horizontal axis is given by $\ell,$ with $0$ at the origin and the values on $\ell$ increasing in the direction of $\mathbf{v}.$
That is, we can imagine the curve $\gamma$ as being the graph of some function $\mathbb{R}\to\mathbb{R},$ except that instead of drawing this graph on the usual $(x,y)$-plane we are drawing it on $\mathbf{x}+\Pi.$
The expression for the directional derivative given as a limit is telling you that the value of the directional derivative (of $f$ at $\mathbf{x}$ in the direction of $\mathbf{v}$) is the slope, under the coordinate system I have described above, of the line which is contained in $\mathbf{x}+\Pi$ and tangent to the curve $\gamma,$ that is, "the slope of the tangent line" in an appropriate plane and coordinate system.