The short answer is that the order of the indices does matter, and that is because when you introduce a metric tensor you (or some people) are constantly raising and lowering indices.
A lot of authors say "transpose" when they really mean the adjoint. The adjoint of a map $A$ with respect to a metric $g$ is the linear transformation $A^{\text{Ad}}$ such that for any vectors $v$ and $w$
$$g(A(v),w) = g(v,A^{\text{Ad}}(w))$$
If you express $A^{\text{Ad}}$ by its components with respect to a basis, you can check that
$${(A^{\text{Ad}})^{\mu}}_{\nu} = {A^{\alpha}}_{\beta}g^{\mu\beta}g_{\alpha\nu} =: {A_{\nu}}^{\mu}$$
where the $g_{\mu\nu}$ are the components of the metric tensor. That is why I don't particularly like the raising and lowering of indices: it hides the fact that there is a metric tensor involved, and it looks like you just interchanged the horizontal positions of the indices.
Now, if you are working in an orthonormal basis, the components of the metric tensor are $\delta_{\mu\nu}$ (ie. a Kronecker delta), and then you can calculate the adjoint of $A$ by simply interchanging the rows and columns of its matrix representative ${A^{\mu}}_{\nu}$. This operation of "flipping the matrix" came to be known as the transpose, but again it only makes sense when you are using orthonormal coordinates.
The point here is that the concept you should be looking for is the adjoint of a linear transformation, and it only reduces to the "transpose" if you take its components with respect to an orthonormal frame.
For more details you can check my answer to this question, where I treat exactly this kind of issues.
Okay, by request of @MathAsFun, I will add an example.
Let $V$ be an $n$-dimensional vector space with metric $g\in T^{0,2}V$, and take a linear map $\phi:V\to V$.
We now choose an orthonormal basis $\{e_{\mu}\}_{\mu\in I_{n}} \subseteq V$ (Where $I_n$ stands for the set $\{1,\dots,n\}$).
We get the components $g_{\mu\nu}$ of the metric $g$ by aplying it to the basis pairwise (i.e, $g_{\mu\nu} := g(e_{\mu},e_{\nu})$), and since the basis is orthonormal, then
$$g_{\mu\nu} = \delta_{\mu\nu} := \begin{cases}1 & \mu = \nu \\ 0 & \mu \neq \nu \end{cases}$$
The components of $\phi^{\text{Ad}}$ are related to those of $\phi$ by
$${(\phi^{\text{Ad}})^{\mu}}_{\nu} = {\phi^{\alpha}}_{\beta}g^{\mu\beta}g_{\alpha\nu}$$
Okaaaaaay. Now, for concreteness, let's say $n = 2$. So we proceed to calculate the components ${(\phi^{\text{Ad}})^{\mu}}_{\nu}$
$$\begin{align}
{(\phi^{\text{Ad}})^{1}}_{1}
&= {\phi^{\alpha}}_{\beta}g^{1\beta}g_{\alpha1}\\
&= {\phi^{1}}_{1}g^{11}g_{11} &= {\phi^{1}}_{1}
\end{align}$$
$$\begin{align}
{(\phi^{\text{Ad}})^{1}}_{2}
&= {\phi^{\alpha}}_{\beta}g^{1\beta}g_{\alpha2}\\
&= {\phi^{2}}_{1}g^{11}g_{22} &= {\phi^{2}}_{1}
\end{align}$$
$$\begin{align}
{(\phi^{\text{Ad}})^{2}}_{1}
&= {\phi^{\alpha}}_{\beta}g^{2\beta}g_{\alpha1}\\
&= {\phi^{1}}_{2}g^{22}g_{11} &= {\phi^{1}}_{2}
\end{align}$$
$$\begin{align}
{(\phi^{\text{Ad}})^{2}}_{2}
&= {\phi^{\alpha}}_{\beta}g^{2\beta}g_{\alpha2}\\
&= {\phi^{2}}_{2}g^{22}g_{22} &= {\phi^{2}}_{2}
\end{align}$$
If you write the components ${\phi^{\mu}}_{\nu}$ and ${(\phi^{\text{Ad}})^{\mu}}_{\nu}$ as matrices, you can see that one really is the transpose of the other.
Again, as you can see from the calculations, this only holds in the case of an orthonormal basis. Otherwise, in the summation you would get non-zero off-diagonal terms for the components of the metric, ($g_{12}$ and $g_{21}$) or diagonal terms ($g_{11}$ and $g_{22}$) different of 1, and that of course would destroy this property.
I've seen in the literature the notation $C$ with some additional specifications for the contraction maps of all sorts, but the amount of decorations on the symbol $C$ varied depending on the context. See, e.g., A.Gray, Tubes, p.56, where these maps are used in the case of somewhat special tensors, and therefore the notation is simpler.
In general, there is a whole family of uniquely defined maps
$$
C^{(r,s)}_{p,q} \colon \otimes^{r}_{s} V \to \otimes^{r-1}_{s-1} V
$$
which are collectively called tensor contractions ($1 \le p \le r, 1 \le q \le s$).
These maps are uniquely characterized by making the following diagrams commutative:
$$
\require{AMScd}
\begin{CD}
\times^{r}_{s} V @> {P^{(r,s)}_{p,q}} >> \times^{r-1}_{s-1} V\\
@V{\otimes^{r}_{s}}VV @VV{\otimes^{r-1}_{s-1}}V \\
\otimes^{r}_{s} V @>{C^{(r,s)}_{p,q}}>> \otimes^{r-1}_{s-1} V
\end{CD}
$$
Explanations are in order.
Recall that the tensor products $\otimes^{r}_{s} V$ are equipped with the universal maps
$$
\otimes^{r}_{s} \colon \times^{r}_{s} V \to \otimes^{r}_{s} V
$$
where $\times^{r}_{s} V := ( \times^r V) \times (\times^s V^*)$.
Besides that, there is a canonical pairing $P$ between a vector space $V$ and its dual:
$$
P \colon V \times V^* \to \mathbb{R} \colon (v, \omega) \mapsto \omega(v)
$$
Notice that map $P$ is bilinear and can be extended to a family of multilinear maps
$$
P^{(r,s)}_{p,q} \colon \times^{r}_{s} V \to \times^{r-1}_{s-1} V
$$
by the formula:
$$
P^{(r,s)}_{p,q} (v_1, \dots, v_p, \dots, v_r, \omega_1, \dots, \omega_q, \dots, \omega_s) = \omega_q (v_p) (v_1, \dots, \widehat{v_p}, \dots, v_r, \omega_1, \dots, \widehat{\omega_q}, \dots, \omega_s)
$$
where a hat means omission.
Since maps $P^{(r,s)}_{p,q}$ are multilinear, the universal property of the maps $\otimes^{r}_{s}$ implies that there are uniquely defined maps
$$
\tilde{P}^{(r,s)}_{p,q} \colon \otimes^{r}_{s} V \to \times^{r-1}_{s-1} V
$$
and then the maps $C^{(r,s)}_{p,q}$ are given by
$$
C^{(r,s)}_{p,q} := \otimes^{r-1}_{s-1} \circ \tilde{P}^{(r,s)}_{p,q}
$$
Best Answer
The linked post comes from Physics.SE and, in physics, the distinction between indices which label the entry in a multi-dimensional array and abstract indices is not always made.
In the first case, we are only dealing with an equality between matrices, which happens to hold in any basis. This is possible because the two spaces $V^*\otimes W$ and $W\otimes V^*$ are canonically isomorphic.
To deal with this in abstract index, we take the convention that permutations of indices represent the corresponding braiding maps.
If $T \in V\otimes W$ and $R \in W\otimes V$, then $T_{ab} = R_{ba}$ means $T = \tau_{(12)}R$.
In our case, we have $A^T = \tau_{(12)}A$ with $\tau_{(12)}$ the braiding map $W\otimes V^*\to V^*\otimes W$.