Representing tensors using matrix notation is often confusing, but let's assume that
$Y = \begin{pmatrix}y^1_1&y^1_2\\v^2_1&y^2_2\end{pmatrix}$
and similarly for Z. If $W = Y \times Z$ then the components of $W$ are
$w^{ik}_{jl} = y^i_j z^k_l$
You have represented W as a 4x4 matrix, but it would be more accurate to represent it as a 2x2 matrix, each of whose entries is another 2x2 matrix.
Anyway, the four possible contractions of W are:
(1): $w^{ik}_{il} = y^i_i z^k_l$
(2): $w^{ik}_{jk} = y^i_j z^k_k$
(3): $w^{ik}_{ji} = y^i_j z^k_i$
(4): $w^{ij}_{jl} = y^i_j z^j_l$
In terms of matrix operations:
(1) is the component representation of $Tr(Y)Z$
(2) is the component representation of $Tr(Z)Y$
(3) is the component representation of $YZ$
(4) is the component representation of $ZY$
Parts [1] and [2]
The essence of why this is the case comes down to the fact that polar coordinates will form a set of orthonormal coordinates away from the origin. Other names for these coordinates include "Geodesic Coordinates" and "Normal Coordinates," and they tend to be incredibly useful. This is because, at the origin of these coordinates, the metric looks like the Euclidean Metric.
The central part of your question boils down to how the connection given by $\nabla$ in the tangent bundle, $TM$, acts on $(0,0)$ and $(1,0)$ tensors. Remember that the entire point behind defining a connection on a tangent bundle was to generalize the notion of directional derivatives, and so we will be keeping this in mind as we Without getting too much into the details of tensors, we just need to note a couple of things about the connection.
If we assume that our tangent bundle is given a smooth local frame $\{E_i\}$ (for example, in $\mathbb{R}^2$ we would have that the local frame is given by the vector fields $\{\partial_x,\partial_y\}$, and in polar coordinates, these are given by $\{\partial_r,\partial_\theta\}$), then we are acting on vector fields $X,Y \in T^{(1,0)}TM \cong TM$, we have that the map is defined by
$$\nabla:T^{(1,0)}TM \times T^{(1,0)}TM \rightarrow T^{(1,0)}TM:(Y,X)\mapsto\nabla_X Y = (X(Y^k) + X^iY^i\Gamma^k_{ij})E_k$$
However, this is the action on vector fields. When we want to act on functions (which are $(0,0)$-tensors), then we need to change things a little. Again, we want the connection to be our generalization of the directional derivative, so we must have that
$$\nabla:T^{(0,0)}TM \times T^{(1,0)}TM \rightarrow T^{(1,0)}TM:(f,X)\mapsto\nabla_X f = X(f)$$
And if we were to have that $X = E_i = \partial_i$, then we would have that
$$\nabla_X f = \nabla_{\partial_i}f = \partial_i[f]$$
As a small note, in the above presentation, I have not really given any reference to the metric that we are using on the space, but if you require a general connection $\nabla$ is both symmetric and compatible with the metric $g$, then you will necessarily have the Levi-Civita connection.
This gets to your central question. We have that $\nabla_a\nabla_b = \nabla_a(\nabla_b)$ are going to be acting on different things. We will expect that this thing will act on a function, so the innermost derivative will act according to the directional derivative formula, meaning we may write $\nabla_b = \partial_b$. However, underneath our composition, the outside derivative is acting on the vector field $\partial_b$, so it acts according to the formula $\nabla_a(\partial_b) = \partial_a\partial_b + \Gamma^d_{ab}\partial_d$.
Part [3]
I think that you may have gotten a little confused with the notation here. We are using the Einstein summation convention here, so
$$\begin{align}
g^{ab}(\partial_a\partial_b - \Gamma^d_{ab}\partial_d) &= g^{rr}\left(\partial_r\partial_r - \sum_d\Gamma^d_{rr}\partial_d\right)\\
&\qquad +g^{r\theta}\left(\partial_r\partial_\theta - \sum_d\Gamma^d_{r\theta}\partial_d\right)\\
&\qquad +g^{\theta r}(\partial_\theta\partial_r - \sum_d\Gamma^d_{\theta r}\partial_d)\\
&\qquad +g^{\theta\theta}\left(\partial_\theta\partial_\theta - \sum_d\Gamma^d_{\theta\theta}\partial_d\right)
\end{align}$$
This is where the fact that we are using polar coordinates becomes important. The coordinate directions are orthogonal, so $g^{r\theta} = g^{\theta r} = 0$, and the formula reduces to
$$\begin{align}
g^{ab}(\partial_a\partial_b - \Gamma^d_{ab}\partial_d) &= g^{rr}\left(\partial_r\partial_r - \sum_d\Gamma^d_{rr}\partial_d\right) +g^{\theta\theta}\left(\partial_\theta\partial_\theta - \sum_d\Gamma^d_{\theta\theta}\partial_d\right)
\end{align}$$
Parts [4] and [5] These look good!
Let me know if this helped!
Additional Section Added For Clarity
In the passage that you have pictured in your post (thank you for doing that by the way), they are specifically highlighting the fact that if you move from a connection defined in terms of a contravariant basis (e.g. a basis of vector fields $\{E_i\}$) to the dual, covariant, basis of differential 1-forms $\{\varepsilon_i\}$, the values of the Christoffel symbols just change by a sign. In the event that you haven't really dealt with covariant and contravariant vectors much before, you can view vectors as column vectors from linear algebra, covectors as row vectors, and the transformation as a change of basis operation.
As for when $\nabla_b = \partial_b$, this is only guaranteed to happen when your metric is flat or when you have the connection acting on a $(0,0)$-form (a function) which you can view as either being neither covariant nor contravariant, so there are no Christoffel symbols, or you can view as being both covariant and contravariant so that the Christoffel symbols cancel out courtesy of the covariant-contravariant antisymmetry. I will take the former interpretation since it is a little easier to deal with.
Forgive me if I am wrong, but I think that the confusion here is coming from the fact that we are specifically considering the action of the composition $\nabla_a\nabla_b$ on a function and not just an individual $\nabla_k$, so let's take some time to break this composition down a little more.
The symbol $\nabla(f,b) = \nabla_b [f]$ is referring to a map that sends a vector field $b$ paired with a function $f$ to another vector field $\nabla_b$ which will act on the same function $f$. If $f$ was a covariant or contravariant tensor, then we would expect $\nabla_b$ to have Christoffel symbols that would dictate how this new vector field acts, however, since $f$ is neither covariant nor contravariant, we get no such Christoffel symbols, and the action is determined completely by just feeding the function to the vector field directional derivative-style. So, if we just forget the function $f$ for a moment, we can view $\nabla$ as a map that eats the coordinate vector field $\partial_b$ and spits out the vector field $\nabla_b$.
Now, when we consider the map $\nabla(\nabla_b,a) = \nabla_a(\nabla_b)$, we get somthing a bit different. In this case, the symbol is refering to a map that sends a vector field $\partial_a$ paired with another vector field $\nabla_b$ to a third vector field $\nabla_a\nabla_b = \nabla_a(\nabla_b)$. Since $\nabla_b$ is a vector field, if we were to write things out in terms of the definition, we get that
$$\nabla_a(\nabla_b) = \sum_{k\in \{a,b\}}\left[\left([\partial_a][(\nabla_b)^k] - \sum_{i,j\in\{a,b\}}\left[(\partial_a)^i(\nabla_b)^j \Gamma^k_{ij}\right]\right)\partial_k\right]$$
where the superscripts are denoting the coordinate functions.
In the special case that we are looking at, we know that $(\partial_a)^i = 1$ if $i = a$ and $0$ if $i = b$, so our formula reduces to
$$\nabla_a(\nabla_b) = \sum_{k\in \{a,b\}}\left[\left( [\partial_a][(\nabla_b)^k] - \sum_{i,j\in\{a,b\}}(\nabla_b)^j \Gamma^k_{ij}\right)\partial_k\right].$$
Now we plug in our function, and since $f$ is a function that is neither covariant nor contravariant, we know that $\nabla_b$ acts on $f$ according to $\nabla_b[f] = \partial_b[f]$. This gives us
$$\begin{align}\nabla_a\nabla_b[f] &= \nabla_a(\nabla_b)[f]\\
&=\sum_{k\in \{a,b\}}\left[\left([\partial_a][(\partial_b)^k] - \sum_{i,j\in\{a,b\}}(\partial_b)^j \Gamma^k_{ij}\right)\partial_k\right][f]\\
&= \partial_a\partial_b[f] - \sum_{k\in \{a,b\}}\left[ \Gamma^k_{ab}\partial_k[f]\right]
\end{align}$$
where we were able to reduce down to the final term because we know that $(\partial_b)^j = 1$ if $j=b$ and $0$ if $j=a$.
Let me know if you need any additional clarification!
References for Covariant Derivative
- Riemannian Manifolds (1997) by John Lee p. 50
- https://conf.math.illinois.edu/~kapovich/423-14/covariant.pdf
- https://en.wikipedia.org/wiki/Covariant_derivative#Functions (Wikipedia is actually great for a lot of general math things)
- https://www.youtube.com/watch?v=EFKBp52LtDM&t=0s followed by https://www.youtube.com/watch?v=cEEahoUUGyc
- https://graphics.stanford.edu/courses/cs468-13-spring/assets/lecture11-mildenhall.pdf
- Differential Geometry and Its Applications by John Oprea p. 82 (this is nice since he goes through everything very explicitly in terms of coordinates)
Best Answer
Why would you be allowed to arbitrarily change indices for a single term on the RHS? Equation (b) is a-priori not true (it’s true for specific choices of $\Gamma$, for example the flat connection in a specific coordinate system) but I don’t see why it holds in general.
Let’s take a specific example. Fix an arbitrary connection $\nabla$ on the tangent bundle, and a coordinate system. Then, the torsion of $\nabla$ is a $(1,2)$-tensor field which has components \begin{align} T^a_{\,bc}&=\Gamma^a_{\,bc}-\Gamma^a_{\,cb}.\tag{$*$} \end{align} Now, you can’t just arbitrarily say “the LHS has index $a$ upstairs and indices $b,c$ downstairs so we also have $T^a_{\,bc}=\Gamma^a_{cb}-\Gamma^a_{bc}$”. In this second equation, both sides still have index $a$ upstairs and indices $b,c$ downstairs; but this is total nonsense (if this equation were true, we’d find that the torsion is zero, which is not true in general).
What you are allowed to do is change all the indices everywhere: \begin{align} \begin{cases} T^{a}_{\,cb}&=\Gamma^a_{\,cb}-\Gamma^a_{\,bc}\\ T^{@}_{\,\sharp\flat}&=\Gamma^@_{\,\sharp\flat}-\Gamma^@_{\,\flat\sharp}\\ T^{\alpha}_{\,\beta\delta}&=\Gamma^{\alpha}_{\,\beta\delta}-\Gamma^{\alpha}_{\,\delta\beta}, \end{cases} \end{align} where $a,b,c,@,\sharp,\flat,\alpha,\beta,\delta\in\{1,\dots, n\}$. These are all examples of valid equations which follow directly from the definition $(*)$. Here, I have made the appropriate changes everywhere, not just shuffling around indices for a few specific terms.
Edit:
I just took a look at your link and you already have several great answers, so I’m not really sure what the issue is. Anyway, here’s what I’ll say. Let us give definitions for certain sub-terms: \begin{align} \nabla_d\nabla_cv^a=\underbrace{\partial_d\partial_cv^a}_{A^a_{cd}}+\underbrace{\left(\partial_d\Gamma_{bc}^a\right)v^b}_{B^a_{cd}}+\underbrace{(-\Gamma_{cd}^b\partial_bv^a-\Gamma_{cd}^b\Gamma_{eb}^av^e)}_{C^{a}_{cd}}+ \underbrace{\Gamma_{bd}^a\Gamma_{ec}^bv^e}_{=D^a_{cd}}+ \underbrace{(\Gamma_{bc}^a\partial_dv^b+\Gamma_{bd}^a\partial_cv^b)}_{E^a_{cd}}. \end{align} So, $\nabla_d\nabla_cv^a=A^a_{cd}+B^{a}_{cd}+C^a_{cd}+D^a_{cd}+E^a_{cd}$ is a sum of five terms. Now, by inspecting the definitions, the following is very obvious: $A,C,E$ are symmetric in the lower two indices. We have $A^a_{cd}=A^a_{dc}$ due to equality of mixed partials. We have $C^a_{cd}=C^a_{dc}$ because you’re assuming a torsion-free connection in that post. Lastly, $E^a_{cd}=E^a_{dc}$ is very clear (if you want, the reason is commutatitivty of addition of real numbers).
So, out of thse five terms, three of them are symmetric in the lower indices $c,d$, so if you subtract $\nabla_{c}\nabla_dv^a$, then only the $B,D$ terms will contribute: \begin{align} \nabla_d\nabla_cv^a-\nabla_c\nabla_dv^a&=(B^a_{cd}-B^a_{dc})+(D^a_{cd}-D^a_{dc})\\ &=(\partial_d\Gamma^a_{bc}v^b-\partial_c\Gamma^a_{bd}v^b)+ (\Gamma_{bd}^a\Gamma_{ec}^bv^e-\Gamma_{bc}^a\Gamma_{e}^bv^e). \end{align} Now, you can rename some dummy indices to figure out what the components of the Riemann curvature are.