Reconciling Expressions for Riemann Curvature Tensor – Differential Geometry

curvaturedifferential-geometryriemannian-geometry

[Note: This has been crossposted to Physics SE, but I haven't found a thourough explanation there so far, so I'm posted the question here as well] I'm using Einstein's summation convention throughout. Also since I'm new to this subject, I'd be really grateful if any assertions can be shown through explicit calculations so that I can follow along and learn in the process.

I'm reading Carroll's GR notes and I'm having trouble deciphering a particular expression for the Riemann curvature tensor. The coordinate-free definition is (eq. 3.71 in Carroll's notes):
$$R(X,Y)Z=\nabla_X\nabla_YZ-\nabla_Y\nabla_XZ-\nabla_{[X,Y]}Z\tag{3.71}$$
Some more background on the first equation, based on my reading of "Semi-Riemannian Geometry: The Mathematical Language of General Relativity" by Newman:

$R$ is a map from $\mathfrak{X}(M)^3$ to $\mathfrak{X}(M)$ such that (3.71) holds. Newman's book doesn't really treat $R(X,Y)$ as a separate object (as far as I've read). $R$ is just treated as a map of 3 vector fields and with the weird notation $R(X,Y)Z$ instead of $R(X,Y,Z)$ (again, as far as I've read). Then there is a theorem showing that $R$ is multilinear in all its arguments and anti-symmetric in its first 2 arguments. And then the following assertion:

Let $(M,\nabla)$ be a smooth manifold with a connection, and let $(U,(x^i))$ be a chart on $M$. Then in local coordinates, $R(\partial/\partial x^{\mu},\partial/\partial x^{\nu})\partial/\partial x^{\sigma}$ can be expressed as: $$R\bigg(\frac{\partial}{\partial x^{\mu}},\frac{\partial}{\partial x^{\nu}}\bigg)\frac{\partial}{\partial x^{\sigma}}=R_{\ \ \sigma\mu\nu}^{\rho}\frac{\partial}{\partial x^{\rho}}$$

Based on the above, if $V\in\mathfrak{X}(M)$, then
$$R\bigg(\frac{\partial}{\partial x^{\mu}},\frac{\partial}{\partial x^{\nu}}\bigg)V=R\bigg(\frac{\partial}{\partial x^{\mu}},\frac{\partial}{\partial x^{\nu}}\bigg)\bigg(V^{\sigma}\frac{\partial}{\partial x^{\sigma}}\bigg)
\\=V^{\sigma}R\bigg(\frac{\partial}{\partial x^{\mu}},\frac{\partial}{\partial x^{\nu}}\bigg)\frac{\partial}{\partial x^{\sigma}}=V^{\sigma}R_{\ \ \sigma\mu\nu}^{\eta}\frac{\partial}{\partial x^{\eta}}$$

where the second equality holds due to multilinearity of $R$.
Now if I act this on the $x^{\rho}$ coordinate function, I get
$$\bigg(R\bigg(\frac{\partial}{\partial x^{\mu}},\frac{\partial}{\partial x^{\nu}}\bigg)V\bigg)(x^{\rho})\equiv\bigg(R\bigg(\frac{\partial}{\partial x^{\mu}},\frac{\partial}{\partial x^{\nu}}\bigg)V\bigg)^{\rho}=V^{\sigma}R_{\ \ \sigma\mu\nu}^{\eta}\frac{\partial}{\partial x^{\eta}}(x^{\rho})
\\=V^{\sigma}R_{\ \ \sigma\mu\nu}^{\eta}\frac{\partial x^{\rho}}{\partial x^{\eta}}=R_{\ \ \sigma\mu\nu}^{\rho}V^{\sigma}\tag{1}$$

So far, so good. Important note: I'm using the notation $\nabla_{\mu}\equiv\nabla_{\frac{\partial}{\partial x^{\mu}}}$, where $\partial/\partial x^{\mu}$ is the $\mu$-th basis vector field on $(U,(x^i))$.

Now in eq. (3.71), I can replace $X,Y$ by fields $\partial_{\mu},\partial_{\nu}$ respectively and $Z$ by $V$, then I can get the local coordinates for both sides by acting them on coordinate function $x^{\rho}$:
$$(R(\partial_{\mu},\partial_{\nu})V)(x^{\rho})\equiv(R(\partial_{\mu},\partial_{\nu})V)^{\rho}=([\nabla_{\mu},\nabla_{\nu}]V)(x^{\rho})-(\nabla_{[\partial_{\mu},\partial_{\nu}]}V)(x^{\rho})
\\=([\nabla_{\mu},\nabla_{\nu}]V)(x^{\rho})\equiv([\nabla_{\mu},\nabla_{\nu}]V)^{\rho}\tag{2}$$

Comparing (1) and (2), I get $$R_{\ \ \sigma\mu\nu}^{\rho}V^{\sigma}=([\nabla_{\mu},\nabla_{\nu}]V)^{\rho}\tag{3}$$

The above equation completely follows from (3.71). Now in Carroll's notes, an index-based expression for the Riemann curvature tensor is also given (eq. 3.66):
$$R^{\rho}_{\ \ \sigma\mu\nu}V^{\sigma}=[\nabla_{\mu},\nabla_{\nu}]V^{\rho}+T_{\mu\nu}^{\ \ \ \ \lambda}\nabla_{\lambda}V^{\rho}\tag{3.66}$$

So far torsion-free assumption hasn't been made anywhere. To reconcile (3.71) with (3.66), I have to show (comparing RHS of (3) and (3.66)):
$$([\nabla_{\mu},\nabla_{\nu}]V)^{\rho}=[\nabla_{\mu},\nabla_{\nu}]V^{\rho}+T_{\mu\nu}^{\ \ \ \ \lambda}\nabla_{\lambda}V^{\rho}\tag{4}$$

This is where I'm stuck and I really cannot wrap my head around it. How do I go about proving eq. (4)? The first term on the RHS – I can only infer that it's $[\nabla_{\mu},\nabla_{\mu}](V^{\rho})$. If it were $([\nabla_{\mu},\nabla_{\nu}]V)^{\rho}$, then it would equal the LHS and the 2nd RHS term would just be zero and meaningless to include – this can't be. But then if it's $[\nabla_{\mu},\nabla_{\mu}](V^{\rho})$, well $V^{\rho}$ is a $C^{\infty}(U)$ function and this would evaluate to $\partial_{\mu}\partial_{\nu}V^{\rho}-\partial_{\nu}\partial_{\mu}V^{\rho}=0$ by equality of mixed partial derivatives.

I'm not sure where I've gone wrong in the specific approach outlined in this question. I'd appreciate any help or corrections!


EDIT: Some calculations I've done after looking at the accepted answer:
$$A=\nabla_X(\nabla_YZ)=\nabla_X(\nabla_{Y^b\partial_b}Z)=\nabla_X(Y^b\nabla_{\partial_b}Z)$$
$$=X(Y^b)\nabla_{\partial_b}Z+Y^b\nabla_X(\nabla_{\partial_b}Z)=X(Y^b)\nabla_{\partial_b}Z+Y^b\nabla_{X^a\partial_a}(\nabla_{\partial_b}Z)$$
$$=X(Y^b)\nabla_{\partial_b}Z+Y^bX^a\nabla_{\partial_a}(\nabla_{\partial_b}Z)$$
$$\\$$
$$B=\nabla_{\nabla_XY}Z=\nabla_{\nabla_X(Y^b\partial_b)}Z=\nabla_{X(Y^b)\partial_b+Y^b\nabla_X\partial_b}Z$$
$$=\nabla_{X(Y^b)\partial_b}Z+\nabla_{Y^b\nabla_X\partial_b}Z=X(Y_b)\nabla_{\partial_b}Z+Y^b\nabla_{\nabla_X\partial_b}Z$$
$$=X(Y_b)\nabla_{\partial_b}Z+Y^bX^a\nabla_{\nabla_{\partial_a}\partial_b}Z$$
$$\\$$
$$A-B=X^aY^b\nabla_a\nabla_bZ
\\\implies \nabla_{\partial_a}(\nabla_{\partial_b}Z)-\nabla_{\nabla_{\partial_a}\partial_b}Z=\nabla_a\nabla_bZ\tag{5}$$

So now I have to convince myself that eq. (5) is correct.

Best Answer

See also the questions https://physics.stackexchange.com/q/342268 and https://math.stackexchange.com/a/2174322/261022.

The problem in your computation is mostly notational: the usual convention distinguishes between $\nabla_X\nabla_YZ$ and $X^aY^b\nabla_a\nabla_bZ$, so that it is in fact not the standard notation to let $\nabla_{\partial_a}\nabla_{\partial_b}Z=\nabla_a\nabla_bZ$. Rather, the correct expression also differentiates $\partial_b$ along $\partial_a$, namely $$\nabla_{\partial_a}\nabla_{\partial_b}Z=\nabla_a\nabla_bZ+\nabla_{(\nabla_{\partial_a}\partial_b)}Z.$$ Using this formula in your calculation should give you the expression you are looking for.

To see why the above expression should be the right one, consider the following computation: $$\nabla_X(\nabla_YZ)=X^a\nabla_a(Y^b\nabla_bZ)=X^a(\nabla_aY^b)\nabla_bZ+X^aY^b\nabla_a\nabla_bZ=\nabla_{\nabla_XY}Z+X^aY^b\nabla_a\nabla_bZ$$ so that $\nabla\nabla Z(X,Y)=\nabla\left(\nabla Z(Y)\right)(X)-\nabla Z(\nabla_XY)$, for every vector fields $X,Y,Z$. The linked questions/answers provide also some more details, but it boils down to this.


Addendum: the covariant derivative $\nabla$ is a connection on the vector bundle $TM\to M$. This means that for any section $Z$ of $TM$ (i.e. a vector field) and any tangent vetor $v$ to $M$, the expression $\nabla_vZ$ is another section of $TM$. So, if $X$ and $Y$ are vector fields, it makes sense to consider $\nabla_X\left(\nabla_Y Z\right)$, but notice that this is not tensorial in $X,Y$. This would cause some problems if we defined $\nabla_a$ to be $\nabla_{\partial_a}$: not for the derivative of a single vector field, but when you iterate the covariant derivative things get more confusing, as we saw.

Anyway, a notation like $\nabla_a\nabla_b T^c$ usually denotes the components of the tensor $\nabla\nabla T$, namely we define $\nabla_a\nabla_b T^c$ to be the unique locally defined function such that $$\nabla\nabla T=\nabla_a\nabla_b T^c\mathrm{d}x^a\otimes\mathrm{d}x^b\otimes\partial_{x^c}$$ so $\nabla_a\nabla_b T^c\mathrm{d}x^a\otimes\mathrm{d}x^b\otimes\partial_{x^c}=\mathrm{d}x^c\left(\nabla\nabla T(\partial_{x^a},\partial_{x^b})\right)$.

Similarly, $\nabla_XY=X^a\nabla_aY^b\partial_{x^b}$, but $\nabla_a Y^b$ is not the derivative of the function $Y^b$. Rather, $\nabla_aY^b=\mathrm{d}x^b(\nabla Y(\partial_{x^a}))$.