Let $f: D\subseteq \Bbb R^n \to \Bbb R$ be differentiable. Then
$$\begin{align}\require{cancel}\nabla_{\vec v} f(\vec x_0) &= \lim_{h\to 0} \frac{f(\vec x_0 + h\vec v)-f(\vec x_0)}{h} \\ &= \lim_{h\to 0} \frac{\left(\color{red}{\cancel {\color{black}{f(\vec x_0)}}}+\nabla f(\vec x_0)\cdot (h\vec v) + o(h)\right) - \color{red}{\cancel {\color{black}{f(\vec x_0)}}}}{h} \\ &= \lim_{h\to 0} \frac{\color{red}{\cancel {\color{black}{h}}}\left[\nabla f(\vec x_0)\cdot (\vec v)\right]}{\color{red}{\cancel {\color{black}{h}}}} + \cancelto{0}{\lim_{h\to 0}\frac{o(h)}{h}} \\ &= \nabla f(\vec x_0) \cdot \vec v\end{align}$$
You don't have to use a unit vector to calculate the directional derivative, but the dd will only correspond to the geometric idea of slope if you use a unit vector $\vec v$.
Edit: I assume that you are familiar with Taylor's theorem. Recall that the first order Taylor expansion of a function $g: \Bbb R\to \Bbb R$ around $a$ is
$$g(a+h) = g(a) + g'(a)h + o(h)$$
Here $o(h)$ is a stand-in for the remainder function $g(a+h)-g(a)-g'(a)h$. This notation (called little oh notation) tells us that the remainder has the property $$\lim_{h\to 0}\frac{g(a+h)-g(a)-g'(a)h}{h} = 0.$$
For functions of a vector variable, there's a similar Taylor expansion:
$$f(\vec a + \vec h) = f(\vec a) + \nabla f(\vec a)\cdot \vec h + o(\|\vec h\|)$$
So what I'm doing above is replacing $f(\vec x_0 + h\vec v)$ with its first order Taylor expansion. Then two terms cancel, one tends to zero, and we're left with the identity you're looking for.
The limit $\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t$ gives the definition of the derivative in the direction of the unit vector $v$ at $x=x_0\in \mathbb R^n$, that is $\frac{\partial}{\partial v} f (x_0)$.
The formula
$$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v$$
gives a property which is valid under the hypothesis that $f$ is differentiable at $x=x_0$, and is quite useful for calculations. (If $f$ is not differentiable at $x=x_0$, then that relation doesn't need be true, even if all directional derivatives exist.)
The idea of the proof is that being $f$ differentiable at $x_0$, then the gradient $\nabla f(x_0)$ exists and
$$\lim_{x\to x_0}\frac{|f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)|}{||x-x_0||}=0$$
Let's think of the point $x=x_0+tv$ (say for fixed $x_0$ and $v$). By definition of directional derivative (and substracting and adding $\nabla f(x_0)\cdot (x_0+tv-x_0$), leads to
$$\frac{\partial}{\partial v} f (x_0)=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t=$$
$$=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}+\frac{\nabla f(x_0)\cdot(x_0+tv-x_0)}{t}.$$
And because the limit of the first summand is $0$ (why?) (*) and the second one is constant the result is $$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v,$$
which gives the usual formula.
What might be more interesting to understand this relation is when there's no such relation. Let $f \colon \mathbb R^2 \to \mathbb R$, and
$$f(x,y)=
\begin{cases}
\tfrac{x^2y}{x^2+y^2} & (x,y)\neq (0,0) \\
0 & (x,y)=(0,0). \\
\end{cases}$$
An easy calculation using the definition shows that, if $v=(v_x,v_y)$ (let's assume $||v||=1$), the directional derivative is in each direction
$$\frac{\partial}{\partial v} f (0,0)=\frac{v_x^2 v_y}{v_x^2+v_y^2}=v_x^2 v_y$$
(in particular, both $\frac{\partial}{\partial x} f (0,0)$ and $\frac{\partial}{\partial y} f (0,0)$ are zero, that is $\nabla f(0,0)=(0,0)$.
So, if the 'dot-product formula' were valid, it should be the case that $$\frac{\partial}{\partial v} f (0,0)=(0,0)\cdot (v_x,v_y)=0,$$
which only happens in the directions of the $x$ and $y$ axes. (BTW, this also proves that $f$ is not differentiable at $(0,0)$.)
I suggest you try to imagine why the way in which directional derivatives vary as we change direction in this case (think of the $xy$ plane as the floor) are not compatible with the existence of a tangent plane (differentiability).
(*) In order to verify that
$$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}=0,$$
first note that $\frac{|t|\,||v||}{t}$ equals plus or minus $||v||$, depending on the sign of $t$, which means is a bounded function of $t$ ($t\neq 0$). So, to prove our claim is enough to show that
$$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}=0.$$
But this is a consequence of $f$ being differentiable. Indeed, we say that $f\colon \mathbb R^n \rightarrow \mathbb R$ is differentiable at $x_0$ if and only if
$$\lim_{x\to x_0} \frac{f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)}{||x-x_0||}=0.$$
Our expression just has $x_0+tv$ instead of $x$, and as the limit is for $t\to 0$, it is also true that $x_0+tv\to x_0$. The only difference is that the definition of differentiable function uses a double/triple/etc. limit (think of sequences of points of $\mathbb R^n$ converging to $x_0$ from every direction and in all sorts of simple or complicated paths), while in our limit $x$ tends to $x_0$ only along the straight line in the direction of $v$. But since $f$ is differentiable at $x_0$, the last limit is $0$, and the same is true if we restrict to the subset of $\mathbb R^n$ that is such line.
Best Answer
I don’t know how well you already know stuff so I’m going to just justify everything I can. There’s a paragraph about the gradient at the bottom. Also the logic extends so I’ll just talk about functions with 2 inputs (surfaces) instead of three. It might be more helpful. You also might need some visuals and I think there are several things online or I believe khan academy has some helpful visuals even though I’m not always a fan of their regular videos.
Consider the multivariable chain rule. Recall that it works intuitively because the change in z arises from the combined changes in x and in y. Also you can try some practical thought experiments help make it clear, all with the basic idea that for $z(x(t),y(t))$, the change of the height z as you go through time (or more generally some other parameter) is the amount it changes due to x plus the amount it’s changing due to y since they are independent. You could readily substitute amount and rate in that previous sentence, which yields the multivariable chain rule, as was desired.
Now, this $f(x(t),y(t))$ can be interpreted as a path above a graph on the xy plane, which goes along the surface $f$. When the graph (x,y) is a straight line and you move along it with constant speed, you have the derivative as you move in that direction (not just x or y, which are the partial derivatives). If you move at different speeds, like twice the speed, the height changes twice as much in the same amount of time so the directional derivative also doubles. Directional derivatives, however, we take when moving at a constant speed (so they’re like a regular partial derivative but moving in a different direction), thus the vector is normalized. That’s the interpretation of not normalizing the vector (which is just moving at a different speed) cause I don’t feel like normalizing this: suppose $\frac{dy}{dt}=2$, and $\frac{dx}{dt}=3$ (I’ll try to look up and make the partial derivative symbol in there later). As you move along the vector 2,3 in a unit of time, the rate of change of z is these substituted into the multivariable chain rule, which is algebraically equivelant to the gradient dot the vector you move along in a period.
Additionally, the dot product is saying how much length of one vector there is per length of the other along it’s direction, because this is equivalent to projecting one vector onto another and multiplying their lengths. Note: it doesn’t matter which is projected onto which because the dot product is symmetric. Anyways, locally the gradient is the direction of steepest ascent, and dotting a vector with it is just saying how much is this unit motion lined up along this direction of main (positive) ascent, the quantity you’re measuring. This comparison to the gradient is logically the same as saying how much of the increase (gradient) is in a particular direction, hence, directional derivative.
*The gradient is how a function ascends, ie a certain amount in one direction plus a certain amount in the other(s), or: $i\frac{df}{dx}+j\frac{df}{dy}$ (again, sorry about the latex, and that’s meant to be i hat and j hat). Viewed in this light, it only makes sense that “how it’s ascending” ascends in the direction of steepest ascent. It’s just as a gradient of a single variable function is a vector telling you loosely “how it’s changing”, and if you can connect this to the idea that the vector (which is just a scalar in this case) points in the direction of increase according to how it’s increasing you can see the analogy for higher dimensional gradients. Of course you could also prove this analytically in several ways, one of which is working backwards and optimizing the angle of the directional derivative (which weirdly isn’t circular reasoning and doesn’t require you to already know about the gradient).
I haven’t thought through this too much but since every analytic function is has a tangent plane you could figure it out by doing it on a plane.
And again, choosing some real world or more abstract variables with this and doing a thought experiment is fun. And like Mohammad said, like normal slope, it’s a scalar.