(1) Note that $L_X Y = [X,Y]= \nabla_X Y - \nabla_Y X$. Moreover, on $\mathbb{R}^n$, the covariant derivative is given by $\nabla_X Y = (X(Y_1), \ldots, X(Y_n))$. If $\partial_i$ is the $i$-th coordinate vector field on $\mathbb{R}^n$, then
$$
\begin{align*}
L_{\nabla f}g(\partial_i,\partial_j) &= (\nabla f)(g(\partial_i,\partial_j))-g([\nabla f,\partial_i],\partial_j) - g(\partial_i, [\nabla f, \partial_j]) \\
&= (\nabla f)(\delta_{ij}) + g(\partial_i \nabla f,\partial_j) + g(\partial_i, \partial_j \nabla f) \\
&= 0 + \partial_i\partial_j f + \partial_j \partial_i f = 2\partial_i\partial_j f.
\end{align*}
$$
And this is equal to the $(i,j)$-th entry of the Hessian matrix of $f$ (apart from the factor 2 of course).
(2) For all smooth vector fields $X$, $Y$ we have
$$
\begin{align*}
L_{\nabla f}g(X,Y) &= (\nabla f)(g(X,Y)) - g([\nabla f,X], Y)- g(X,[\nabla f, Y])\\
&= g(\nabla_{\nabla f} X, Y) +g(X,\nabla_{\nabla f}Y) \\
&\qquad- g(\nabla_{\nabla f}X-\nabla_X \nabla f, Y)-g(X,\nabla_{\nabla f}Y- \nabla_Y\nabla f)\\
&= g(\nabla_X \nabla f, Y) + g(X,\nabla_Y\nabla f).
\end{align*}
$$
If we show that the two last terms are equal, then we are done. This follows from the definition of a gradient and the fact that the covariant derivative is torsion free (because we are working with the Levi-Civita connection).
$$
\begin{align*}
g(\nabla_X \nabla f,Y) -g(X,\nabla_Y\nabla f) &= X g(\nabla f,Y) - g(\nabla f, \nabla_X Y) \\
& \qquad- Y g(\nabla f,X) + g(\nabla f,\nabla_Y X) \\
&= X(Y(f)) - Y(X(f)) - [X,Y](f) = 0
\end{align*}
$$
The directional derivative along a direction $v$ in Euclidean space, as usually defined in calculus is, provided everything is sufficiently smooth,
\begin{equation}
v \cdot \mathrm{grad}(f).
\end{equation}
It is not required for $v$ to be a unit vector, I disagree with your last sentence. Of course you may prefer to consider unit vectors, and you can do that in Euclidean space as well as on a Riemannian manifold. No big deal, at the end you are just rescaling by a non-zero constant.
Best Answer
Think of it in terms of properties we want the gradient to have. Properties come first, and formulas are then written out so that they satisfy those properties.
The essential properties of the gradient vector field $\nabla f$ are:
The differential $df$ takes a vector field $X$ and returns a measurement of how quickly $f$ is increasing along the flow line of $X$. If $\nabla f$ has the properties above, then we should be able to obtain this measurement by taking the dot product of $X$ with the gradient field $\nabla f$. So the defining property of the vector field $\nabla f$ is that, for all vector fields $X$, we have: $$ g(\nabla f, X) = df(X) = \partial_Xf $$