The limit $\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t$ gives the definition of the derivative in the direction of the unit vector $v$ at $x=x_0\in \mathbb R^n$, that is $\frac{\partial}{\partial v} f (x_0)$.
The formula
$$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v$$
gives a property which is valid under the hypothesis that $f$ is differentiable at $x=x_0$, and is quite useful for calculations. (If $f$ is not differentiable at $x=x_0$, then that relation doesn't need be true, even if all directional derivatives exist.)
The idea of the proof is that being $f$ differentiable at $x_0$, then the gradient $\nabla f(x_0)$ exists and
$$\lim_{x\to x_0}\frac{|f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)|}{||x-x_0||}=0$$
Let's think of the point $x=x_0+tv$ (say for fixed $x_0$ and $v$). By definition of directional derivative (and substracting and adding $\nabla f(x_0)\cdot (x_0+tv-x_0$), leads to
$$\frac{\partial}{\partial v} f (x_0)=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t=$$
$$=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}+\frac{\nabla f(x_0)\cdot(x_0+tv-x_0)}{t}.$$
And because the limit of the first summand is $0$ (why?) (*) and the second one is constant the result is $$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v,$$
which gives the usual formula.
What might be more interesting to understand this relation is when there's no such relation. Let $f \colon \mathbb R^2 \to \mathbb R$, and
$$f(x,y)=
\begin{cases}
\tfrac{x^2y}{x^2+y^2} & (x,y)\neq (0,0) \\
0 & (x,y)=(0,0). \\
\end{cases}$$
An easy calculation using the definition shows that, if $v=(v_x,v_y)$ (let's assume $||v||=1$), the directional derivative is in each direction
$$\frac{\partial}{\partial v} f (0,0)=\frac{v_x^2 v_y}{v_x^2+v_y^2}=v_x^2 v_y$$
(in particular, both $\frac{\partial}{\partial x} f (0,0)$ and $\frac{\partial}{\partial y} f (0,0)$ are zero, that is $\nabla f(0,0)=(0,0)$.
So, if the 'dot-product formula' were valid, it should be the case that $$\frac{\partial}{\partial v} f (0,0)=(0,0)\cdot (v_x,v_y)=0,$$
which only happens in the directions of the $x$ and $y$ axes. (BTW, this also proves that $f$ is not differentiable at $(0,0)$.)
I suggest you try to imagine why the way in which directional derivatives vary as we change direction in this case (think of the $xy$ plane as the floor) are not compatible with the existence of a tangent plane (differentiability).
(*) In order to verify that
$$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}=0,$$
first note that $\frac{|t|\,||v||}{t}$ equals plus or minus $||v||$, depending on the sign of $t$, which means is a bounded function of $t$ ($t\neq 0$). So, to prove our claim is enough to show that
$$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}=0.$$
But this is a consequence of $f$ being differentiable. Indeed, we say that $f\colon \mathbb R^n \rightarrow \mathbb R$ is differentiable at $x_0$ if and only if
$$\lim_{x\to x_0} \frac{f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)}{||x-x_0||}=0.$$
Our expression just has $x_0+tv$ instead of $x$, and as the limit is for $t\to 0$, it is also true that $x_0+tv\to x_0$. The only difference is that the definition of differentiable function uses a double/triple/etc. limit (think of sequences of points of $\mathbb R^n$ converging to $x_0$ from every direction and in all sorts of simple or complicated paths), while in our limit $x$ tends to $x_0$ only along the straight line in the direction of $v$. But since $f$ is differentiable at $x_0$, the last limit is $0$, and the same is true if we restrict to the subset of $\mathbb R^n$ that is such line.
Why can't a simultaneous increase in x and y give a dramatically different result than either alone? (e.g. a function that rises in the x+ direction and the y+ direction, but falls dramatically along the diagonal?)
Well, it can, but then the function won't be differentiable. One concrete example of a function that has different behavior in the $x$ and $y$ axes then it has in between is the function $z = r\sin(2 \theta)$, in cylindrical coordinates. This function is not differentiable at the origin. It is continuous at the origin and has slopes of $0$ in the $x$ and $y$ directions there - the $x$ and $y$ axes are both contained in the graph of the function. But in other directions the slopes at the origin can be anything else between $1$ and $-1$.
Remember that a point and two slopes in non-parallel directions are all that we need to completely determine a plane. So, if the tangent plane to the graph of $f(x,y)$ is well defined at a point, the slopes of the tangent plane in the $x^+$ and $y^+$ directions completely characterize the plane. A plane, being flat, can't increase along the $x^+$ and $y^+$ axes and decrease in between.
If a function tried to do that, it would not be differentiable at the point in question - it would not be well approximated by the plane that the gradient determines. This is the source of the definition of differentiability: a differentiable function has its slope in each direction determined by that direction and the slopes in the $x^+$ and $y^+$ directions.
The same thing happens in one dimension, we just get too used to it to see it. You might ask, "why does the behavior of a function in the $x^+$ direction determine the behavior in the $x^-$ direction? Why can't a function rise in both the $x^+$ and $x^-$ directions?". Of course, a function can do that, like $y = |x|$ does. But then the function will not be differentiable at the point in question, because it will not be well approximated by the line that is determined by the rate of change in the $x^+$ direction.
The situation in two or more variables is no different. In one dimension, the slope in the $x^+$ direction determines a line. In two dimensions, the slopes in the $x^+$ and $y^+$ directions determine a plane. In either case, we define the function to be differentiable if, around the point we started with, the function is well approximated by that line or plane in every direction that we can go, given the number of dimensions we are working with.
Best Answer
As for why the gradient points in the direction of maximum increase, let's say we don't know this and we want to find a unit vector $\vec{u}$ such that the directional derivative of some function $f$ is the greatest in the direction of $\vec{u}$. Then if $\theta$ is the angle between $\nabla f$ and $\vec{u}$ we have $\nabla f_{\vec{u}}=\nabla f \cdot \vec{u}=|\nabla f||\vec{u}|\cos(\theta)=|\nabla f|\cos(\theta)$ since $\vec{u}$ is a unit vector. This quantity is then maximized when $\cos(\theta)=1$, i.e., when $\theta=0$, thus $\vec{u}$ points in the direction of the gradient.