Derivation for expression for second covariant derivative

connectionsdifferential-geometry

This is a direct follow up to my earlier question: Reconciling different expressions for Riemann curvature tensor. I think it deserves a separate question, plus I don't want to disturb the answerer of the earlier post by repeated queries in comments. The answer in that highlights this identity:
$$\nabla_{\partial_a}(\nabla_{\partial_b}Z)=\nabla_{\nabla_{\partial_a}\partial_b}Z+\nabla^2_{\partial_a,\partial_b}Z\tag{1}$$
The derivation of the above identity is through a calculation of $\nabla_X(\nabla_Y(Z))$. I will avoid using any notational shortcuts that people get used to after experience. Here is my version of the calculation (the only version of chain rule for covariant derivatives I know is $\nabla_X(fY)=X(f)Y+f\nabla_X(Y)$, which is what I'm using):
$$\nabla_X(\nabla_Y(Z))=X^a\nabla_{\partial_a}(Y^b\nabla_{\partial_b}(Z))=X^a(\partial_aY^b)(\nabla_{\partial_b}(Z))+X^aY^b\nabla_{\partial_a}(\nabla_{\partial_b}(Z))$$
$$=X(Y^b)(\nabla_{\partial_b}(Z))+X^aY^b\nabla_{\partial_a}(\nabla_{\partial_b}(Z))$$
$$=\nabla_X(Y^b)(\nabla_{\partial_b}(Z))+X^aY^b\nabla_{\partial_a}(\nabla_{\partial_b}(Z))\tag{2}$$

Now on the RHS, I have a problem with both terms!


First term: It seems that the 1st RHS term is supposed to equal $\nabla_{\nabla_X(Y)}(Z)$. But then
$$\nabla_{\nabla_X(Y)}(Z)=\nabla_{(\nabla_X(Y))^b\partial_b}(Z)=(\nabla_X(Y))^b\nabla_{\partial_b}(Z)\tag{3}$$
Let me evaluate $(\nabla_X(Y))^b$:
$$(\nabla_X(Y))^b=(\nabla_X(Y^c\partial_c))^b=\big(X(Y^c)\partial_c+Y^c\nabla_X(\partial_c)\big)^b=X(Y^b)+(Y^c\nabla_X(\partial_c))^b$$
$$=\nabla_X(Y^b)+Y^c(X^d\nabla_{\partial_d}(\partial_c))^b=\nabla_X(Y^b)+Y^c(X^d\Gamma^e_{dc}\partial_e)^b$$
$$=\nabla_X(Y^b)+Y^cX^d\Gamma^b_{dc}\tag{4}$$
This means $$\nabla_{\nabla_X(Y)}(Z)=(\nabla_X(Y))^b\nabla_{\partial_b}(Z)\neq \nabla_X(Y^b)(\nabla_{\partial_b}(Z))\tag{5}$$

How is this inconsistency resolved?


Second term: The 2nd RHS term is supposed to be $\nabla_{X,Y}^2(Z)$. This means that $$\big(\nabla_{X,Y}^2(Z)\big)^c=\big(\ \ X^aY^b\nabla_{\partial_a}(\nabla_{\partial_b}(Z))\ \ \big)^c=X^aY^b\big(\nabla_{\partial_a}(\nabla_{\partial_b}(Z))(\text{d}x^c)\big)$$
where the last term involves the $(1,0)$-tensor $\nabla_{\partial_a}(\nabla_{\partial_b}(Z))$ acting on $\text{d}x^c$.

But on wikipedia, the following equation is given: $(\nabla^2_{u,v}w)^a=u^cv^b\nabla_c\nabla_bw^a$ (I don't know if this equation is a result, or a definition, or what), which can be re-written as
$$\big(\nabla_{X,Y}^2(Z)\big)^c=X^aY^b\nabla_a\nabla_bZ^c=X^aY^b\big(\nabla\nabla Z(\text{d}x^c,\partial_a,\partial_b)\big)$$

It boils down to showing that $\nabla\nabla Z(\text{d}x^c,\partial_a,\partial_b)=\nabla_{\partial_a}(\nabla_{\partial_b}(Z))(\text{d}x^c)$. Now by definition of total covariant derivative,
$$\nabla(\mathcal{A})(\omega^1,\ldots,\omega^r,X_1,\ldots,X_s,X)=\nabla_X(\mathcal{A})(\omega^1,\ldots,\omega^r,X_1,\ldots,X_s)$$

So $$\nabla_{\partial_a}(\nabla_{\partial_b}(Z))(\text{d}x^c)=\nabla(\nabla_{\partial_b}(Z))(\text{d}x^c,\partial_a)\tag{6}$$ and $$\nabla\nabla Z(\text{d}x^c,\partial_a,\partial_b)\equiv\nabla(\nabla(Z))(\text{d}x^c,\partial_a,\partial_b)=\nabla_{\partial_b}(\nabla(Z))(\text{d}x^c,\partial_a)\tag{7}$$
How do I even equate these two (RHS of eq.s 6 and 7)? Is there some property of total covariant derivative that I'm missing out on?

Best Answer

Forget components. Seriously. First, you don’t need them, and second, although the first part of your calculation is technically true, it is not going to get you what you need. You’re only using the product rule for the covariant derivative for functions times vector fields. Notice that the terms $\nabla_X\nabla_YZ$ and $\nabla_{\nabla_XY}Z$ are both just $\nabla$ acting on a vector field ($\nabla_YZ$ in the first case and $Z$ in the second case), so if you don’t use the definition of how $\nabla$ acts on higher order tensor fields (like $\nabla Z$), then you won’t be able to produce the term $\nabla\nabla Z$; and this is precisely the issue you’re encountering… equation (2) is correctly calculated, but the two terms there are not yet $\nabla_{\nabla_XY}Z$ and $\nabla\nabla Z(Y,X)$. We need to exploit that covariant derivatives also commute with higher order contractions. First, I’ll present the more ‘down to earth’ (interpret that however you wish) calculation, and then later the more ‘abstract’ but also elegant approach.


First way of doing the calculations.

You already know how the covariant derivative $\nabla T$ works and that $(\nabla T)(\cdots,X)=(\nabla_XT)(\cdots)$. One other thing which you need to know and use is the general product rule (true by definition of how $\nabla$ acts on higher order tensor fields): \begin{align} \nabla_X\left(T(\omega^1,\dots,\omega^r,X_1,\dots, X_s)\right)&=(\nabla_XT)(\omega^1,\dots, \omega^r,X_1,\dots, X_s)\\ &+\sum_{i=1}^rT\left(\omega^1,\dots, \nabla_X(\omega^i),\dots, \omega^r,X_1,\dots, X_s\right)\\ &+ \sum_{j=1}^sT\left(\omega^1,\dots, \omega^r,X_1,\dots,\nabla_X(X_j),\dots, X_s\right), \end{align} i.e $\nabla_X$ of a fully evaluated tensor field just hits everything left to right.

The object $\nabla\nabla Z$ is a $(1,2)$ tensor field, so let us fully evaluate it on $(\omega,Y,X)$ (the reason for this ordering instead of $(\omega,X,Y)$ is because of the convention you’re using that the vector field in the last entry is the direction along which we differentiate). So, \begin{align} (\nabla\nabla Z)(\omega,Y,X)&:=\left(\nabla_X(\nabla Z)\right)(\omega,Y)\\ &=\nabla_X\left((\nabla Z)(\omega,Y)\right)-(\nabla Z)(\nabla_X\omega,Y)-(\nabla Z)(\omega,\nabla_XY)\\ &=\nabla_X\left((\nabla_YZ)(\omega)\right)-(\nabla_YZ)(\nabla_X\omega)-(\nabla_{\nabla_XY}Z)(\omega)\\ &=(\nabla_X\nabla_YZ)(\omega)+(\nabla_YZ)(\nabla_X\omega) -(\nabla_YZ)(\nabla_X\omega)-(\nabla_{\nabla_XY}Z)(\omega)\\ &= (\nabla_X\nabla_YZ)(\omega) -(\nabla_{\nabla_XY}Z)(\omega)\\ &=\left(\nabla_X\nabla_YZ-\nabla_{\nabla_XY}Z\right)(\omega). \end{align} Since this is true for all $\omega$, we see that \begin{align} (\nabla\nabla Z)(\cdot,Y,X)&=\nabla_X\nabla_YZ-\nabla_{\nabla_XY}Z. \end{align} The expression on the left is defined to be $\nabla^2_{X,Y}Z$.


Second way of doing the calculations.

It is equivalent, but more convenient to define the covariant derivative by requiring that it commute with all contractions, and satisfy the Leibniz rule for tensor products. You’ll hopefully see what I mean as I go along. The benefit is that now I can consider any tensor field $T$ (I mean I could have done that above as well, but the intermediate steps may have gotten a little messy). Given any vector field $X$, we first note that \begin{align} \nabla_XT&=(\nabla T)(\cdots, X)=\text{tr}((\nabla T)\otimes X). \end{align} So, \begin{align} \nabla_X\nabla_YT&=\nabla_X\left(\text{tr}((\nabla T)\otimes Y)\right)\\ &=\text{tr}\left(\nabla_X\left((\nabla T)\otimes Y\right)\right)\tag{$\nabla_X$ commutes with $\text{tr}$}\\ &=\text{tr}\left(\left(\nabla_X\nabla T\right)\otimes Y+ (\nabla T)\otimes \nabla_XY\right)\tag{Leibniz rule}\\ &=\text{tr}\left((\nabla\nabla T)(\cdots, X)\otimes Y\right)+\nabla_{\nabla_XY}T\\ &=(\nabla\nabla T)(\cdots,Y,X) +\nabla_{\nabla_XY}T, \end{align} where the last two lines are essentially definition, and once again, the first term on the last line is what is commonly denoted $\nabla^2_{X,Y}T$.