I'm having some trouble understanding the covariant derivative as a directional derivative for tensors. The way the covariant derivative was presented to me was by first showing that a vector field can provide a directional derivative for smooth functions on a manifold.

We then asked how we could get the directional derivative for higher rank tensors. We started by listing a number of qualities we wanted our new derivative operator to have. Since we want to have a directional derivative we need a vector to provide a direction and a tensor to derive, so the covariant derivative would be a map from a vector x tensor of type $(s,k)$ to another tensor of type $(s,k)$.

Since this was a derivative we wanted linearity, and we also wanted it to obey the Leibniz rule. We then talked how based on these rules we had some freedom in how we picked the derivative, which we later showed can be fix based on your choice of the connection coefficients. Finally, we applied the covariant derivative to a vector in some particular chart, and got the component expression on how to take the covariant derivative of a vector.

I could follow all of the algebraic manipulations up to this point, however none of this jives with my understanding of a derivative, which is something that measures how something else changes.

We've additionally talked about parallelity and the autoparallel equation $\nabla(V,V) = 0$ (where $\nabla$ is the covariant derivative and $V$ is a vector) but none of this has really helped me understand the covariant derivative as a directional derivative. Could anyone help me out?

Thanks.

## Best Answer

I'm not sure what exactly you're asking for, but I'll try explaining how the directional derivative operator is acting on a smooth function (at least the way I grasp this notion), which is the first thing one encounters, if one is looking at tangent vectors in the scope of topology and diff mfds.

Let $M$ be a smooth manifold. Then, we can construct a vector space $(C^\infty(M),+,\cdot)$, where $C^\infty(M)$ is the set of all infinitely many times differentiable (i.e smooth) maps $f:M\to\mathbb R$, $+$ is the pointwise addition and $\cdot$ is the pointwise s-multiplication. Now let $\gamma:\mathbb R\to M$ be a smooth curve through the point $p\in M$. W.l.o.g we can consider $p=\gamma(0)$ and we define the directional derivative operator at the point $p$ along the curve $\gamma$ as the linear map: $$X_{\gamma,p}:C^\infty(M)\tilde{\to}\mathbb R$$ $$:f\mapsto(f\circ\gamma)'(0)$$ where $f\circ\gamma:\mathbb R\to\mathbb R$ and $(f\circ\gamma)'(0)\in\mathbb R$.

We further define the set $$T_pM:\{X_{\gamma_a,p}\;\mid\;\text{all curves passing through}\; p\in M\}$$ and equip it with the closed operations $\oplus,\odot$: $$\oplus:T_pM\times T_pM\to T_pM$$ $$\odot:\mathbb R\times T_pM\to T_pM$$ which stand for vector addition and s-multiplication on $T_pM$. There is no need for further definition of the target element of these two maps for now. This set, together with these two operations, constitute the tangent vector space, i.e the space whose elements are the directional derivative operators.

As $X\in T_pM$ are operators (in the above frame), we can understand their behaviour by acting them on a smooth function $f:M\to\mathbb R$. So let us choose a chart $(U,x)$, which is an element of the smooth atlas $\mathscr A_M$ on $M$. Let us also consider $\dim(M)$-many curves $\gamma_i$, s.t $$\gamma_i:\mathrm{preim}_{\gamma_i}(U)\to U$$ with $(x^b\circ\gamma_i)(\lambda)=\delta^b_a\lambda$ for $i=1,\cdots,\dim(M)$. Obviously, $U$ is an open subset of $M$, $\lambda$ is the curve parameter, $x^b$ is the b-th component of $x$ and is smooth, because $x:U\to x(U)\subseteq\mathbb R^d$ is smooth due to the atlas being smooth. The index $i$ is just the curve index. For convenience we choose a chart, s.t $x(p)=0_{\mathbb R^d}$. As you see, we did very specific choices. We now act on smooth f with our operator: \begin{align*} X_{\gamma_i,p}(f):&=(f\circ\gamma_i)'(0)\\ &=(f\circ x^{-1}\circ x\circ \gamma_i)'(0)\\ &=\partial_b(f\circ x^{-1})(x(\gamma_i(0)))(x^b\circ\gamma_i)'(0)\\ &=\partial_b(f\circ x^{-1})(x(p))\delta^b_a\\ &=\partial_a(f\circ x^{-1})(x(p))=:\left(\dfrac{\partial}{\partial x^a}\right)_p(f) \end{align*} which means that $$\left(\dfrac{\partial}{\partial x^a}\right)_p:=X_{\gamma_i,p}$$ In order to go from the second equation to the third, we used the chain rule (funny looking that way), where $(\partial_a)_{x(p)}:C^\infty(\mathbb R^d)\to\mathbb R$. The key fact here is that in order to define the tangent vector we need a chart,i.e we need coordinates, so that the partial derivative has meaning as derivative wrt to a component of the coordinate map. On a manifold, we cannot directly have such thing. One also proves that this definition satisfies the Leibnitz rule and that $$\left(\dfrac{\partial}{\partial x^i}\right)_p\qquad,i=1,\cdots,\dim(M)$$ are a linearly independent generating system of the tangent vector space $T_pM$, a.k.a a basis. Obviously, acting this thing on vector fields requires understanding of tangent bundles $TM$, vector fields (as sections of TM),modules,rings etc.

I hope I helped and did not mess things up. Truth being said, I think it's better to stick to the definitions and try not making ill-defined schematics on our mind. It's really helpful, if you draw all these maps and see the compositions yourself.