For your first question, it is somewhat abstract. However, there are two useful concepts.
- Both derivatives are defined independent from metric.
- Both derivatives are assigned artificially, but in different senses.
If you like, both derivatives could be taken as directional derivatives, yet how to understand this "directional" needs further clarification.
For a covariant derivative $\nabla_XY$, it defines the rate of change in $Y$ as you move it along the geodesic determined by $X$. If the connection $\nabla$ has special properties, say, it is a Riemannian connection which preserves the Riemannian metric and is torsion-free, then $\nabla_XY$ corresponds to the rate of change in $Y$ as you move it along the geodesic determined by $X$ by keeping the norm of $Y$ and the angle between $X$ and $Y$ unchanged.
By contrast, a Lie derivative $L_XY=\left[X,Y\right]$ corresponds to the rate of change in $Y$ as it changes along the flow induced by $X$; in particular, it does not define this rate, because the change of a vector along the flow induced by another vector has its own definition. It is hard to explain this "flow" since it involves much more abstract information. Intuitively, you could imagine this: a drop of ink will immediately be deformed and translated by a water flow. Similarly, for two given vector fields $X$ and $Y$, how $Y$ would be "deformed" and "translated" by $X$ is everywhere well-defined. It is in this sense that you compare $Y$ before and after it changes at each location.
For your second question, recall that $X_p:C^{\infty}(M)\to\mathbb{R}$ is called a tangent vector at $p\in M$, where $M$ is a differentiable manifold, if
- $X_p(\lambda f+\mu g)=\lambda X_p(f)+\mu X_p(g)$ for all $\lambda,\mu\in\mathbb{R}$ and $f,g\in C^{\infty}(M)$, and
- $X_p(fg)=fX_p(g)+X_p(f)g$ for all $f,g\in C^{\infty}(M)$.
As you can see, while we call $X_p$ a vector, it is actually an operator acting on smooth functions on $M$. Further results have shown that, locally, $X_p$ observes the representation
$$
X_p=X^{\mu}(p)\frac{\partial}{\partial x^{\mu}}\bigg|_p,
$$
where each $X^{\mu}\in C^{\infty}(M)$, while $\partial_{\mu}$ acts as a basis vector in linear algebra. For one thing, it is called a basis, because all tangent vectors are linear combinations of them. For another, they each are differential operators in the usual sense.
With this understanding, you may see that $L_Xf$, the directional derivative of $f$ with respect to $X$, is exactly $X(f)$, because
$$
X(f)=X^{\mu}\partial_{\mu}f.
$$
Of course, you may still adopt notations in linear algebra and put
$$
X(f)=\mathbf{X}\cdot\nabla f=\frac{\partial f}{\partial\mathbf{X}}.
$$
Nevertheless, these notations are not conventional in differential geometry.
Best Answer
I think you are confusing several things. In the first video you have linked, the manifold $M$ you are working with is embedded in an ambient flat space; then the covariant derivative of a tensor along some path in $M$ is indeed the usual derivative with respect to the ambient coordinates "minus the normal component". As far as I can understand, what you mean by "intrinsic plane" is NOT embedded isometrically into an ambient flat space, so there is no sense in which there is a "normal derivative"; this whole description is not applicable for manifolds described in intrinsic terms, without an isometric embedding. Thus one uses the machinery of Levi-Civita connections an all that. Now, if you start with a manifold isometrically embedded in a flat space (or, say, use the (difficult) Nash's embedding theorem to embed your $M$ in this way) then you can compute the derivative of metric tensor by differentiating the (flat, constant) ambient metric tensor, and "removing the normal component" - and then restricting to appropriate tensor subbundle corresponding to vectors and covectors tangent to $M$; but of course the ambient metric tensor has zero derivative (being constant). "Removing components" and "restricting" zero still gives zero, so the resulting covariant derivative of the metric tensor on $M$ is also zero, just as the intrinsic computation said.
In some sense, the point is that the ambient metric is constant, and it's the restriction that changes as we move in $M$; and since the covariant differentiation takes derivative first, and restricts afterwards, the result is 0.