Let me explain here an actually useful characterisation of a "tensor" that is not as oldfashioned as the one of how a "tensor" transforms under change of coordinates. In the process I hope I can clarify why the difference between two connexions is a "tensor".
I shall assume smoothness everywhere.
A connexion, the way it is defined by the Original Poster, $D:\Gamma(E)\to\Gamma(E\otimes T^*M)$ takes smooth sections of the vector bundle $E$ to differential $1$-forms taking values at sections of $E$. One can see this by recognising that $\Gamma(E\otimes T^*M)\cong\Gamma(E)\otimes_{C^\infty(M;\mathbb{R})}\Gamma(T^*M)\cong\mathrm{Hom}_{C^\infty(M;\mathbb{R})}(\mathfrak{X}(M;\mathbb{R});\Gamma(E))$, where I am identifying the $C^\infty(M;\mathbb{R})$-module of sections of the tangent bundle $\Gamma(TM)$ with vector fields of $M$, as derivations on the algebra $C^\infty(M;\mathbb{R})$ ---I do recommend Lee's Introduction to smooth manifolds, or Wald's General Relativity if one is not used to these notions.
That being said, let $s\in\Gamma(E)$ a section of $E$, $X\in\mathfrak{X}(M;\mathbb{R})$ a vector field on $M$ and $f\in C^\infty(M;\mathbb{R})$ a smooth function. By applying the definition of a connexion (as given by the Original Poster), one can readily see that
$$Ds(fX)=fDs(X)$$
and
$$Dfs(X)=X(f)s+fDs(X) \ .$$
Remark: this last property is usually a defining property for a connexion on a vector bundle when it is defined globally, as opposed to the Original Poster's local definition.
Now, if $D$ and $D'$ are two connexions defined on $E$, their difference satisfies
$$(D-D')s(fX)=f(D-D')s(X)$$
and
$$(D-D')fs(X)=f(D-D')s(X) \ .$$ This last equality ---which does not hold for any of the connexions alone--- that characterise their difference as being a "tensor"; their difference is actually a differential $1$-form taking values on the sections of $E$.
The reason why the difference of two connexions satisfies $(D-D')s(fX)=f(D-D')s(X)$ is because each contribute with an extra term $X(f)s$, $Dfs(X)=X(f)s+fDs(X)$ is a sort of Leibniz rule.
A "tensor" is just a mapping that when its arguments are multiplied by functions, it behaves as a linear mapping regarding this $C^\infty(M;\mathbb{R})$-module structure. This is not as precise as I like, but let me show an example with the metric "tensor" (a riemanninan structure $\mathrm{g}$): with vector fields $X,Y,Z\in\mathfrak{X}(M;\mathbb{R})$ and smooth function $f\in C^\infty(M;\mathbb{R})$, the riemannian metric satisfies $\mathrm{g}(X+fY,Z)=\mathrm{g}(X,Z)+f\mathrm{g}(Y,Z)$.
This type of behaviour, with respect to multiplication by functions, guarantees that these "tensors" only depend on what happens at a point. And the way they transform under diffeomorphisms (change of coordinates) can be deduced from that property.
Homework for the Original Poster: since you know how covariant and contravariant tensors transform under change of coordinates, and I claimed that the difference between two connexions is pretty much a $1$-form taking values on sections of $E$: use a local basis for sections of $E$, and another one for the vector fields of $M$, and show how the values of $Ds(X)$ and $(D-D')s(X)$ change under a change of coordinates. Hint: exploit how the connexions behave when multiplying their arguments by functions.
The short answer is that the order of the indices does matter, and that is because when you introduce a metric tensor you (or some people) are constantly raising and lowering indices.
A lot of authors say "transpose" when they really mean the adjoint. The adjoint of a map $A$ with respect to a metric $g$ is the linear transformation $A^{\text{Ad}}$ such that for any vectors $v$ and $w$
$$g(A(v),w) = g(v,A^{\text{Ad}}(w))$$
If you express $A^{\text{Ad}}$ by its components with respect to a basis, you can check that
$${(A^{\text{Ad}})^{\mu}}_{\nu} = {A^{\alpha}}_{\beta}g^{\mu\beta}g_{\alpha\nu} =: {A_{\nu}}^{\mu}$$
where the $g_{\mu\nu}$ are the components of the metric tensor. That is why I don't particularly like the raising and lowering of indices: it hides the fact that there is a metric tensor involved, and it looks like you just interchanged the horizontal positions of the indices.
Now, if you are working in an orthonormal basis, the components of the metric tensor are $\delta_{\mu\nu}$ (ie. a Kronecker delta), and then you can calculate the adjoint of $A$ by simply interchanging the rows and columns of its matrix representative ${A^{\mu}}_{\nu}$. This operation of "flipping the matrix" came to be known as the transpose, but again it only makes sense when you are using orthonormal coordinates.
The point here is that the concept you should be looking for is the adjoint of a linear transformation, and it only reduces to the "transpose" if you take its components with respect to an orthonormal frame.
For more details you can check my answer to this question, where I treat exactly this kind of issues.
Okay, by request of @MathAsFun, I will add an example.
Let $V$ be an $n$-dimensional vector space with metric $g\in T^{0,2}V$, and take a linear map $\phi:V\to V$.
We now choose an orthonormal basis $\{e_{\mu}\}_{\mu\in I_{n}} \subseteq V$ (Where $I_n$ stands for the set $\{1,\dots,n\}$).
We get the components $g_{\mu\nu}$ of the metric $g$ by aplying it to the basis pairwise (i.e, $g_{\mu\nu} := g(e_{\mu},e_{\nu})$), and since the basis is orthonormal, then
$$g_{\mu\nu} = \delta_{\mu\nu} := \begin{cases}1 & \mu = \nu \\ 0 & \mu \neq \nu \end{cases}$$
The components of $\phi^{\text{Ad}}$ are related to those of $\phi$ by
$${(\phi^{\text{Ad}})^{\mu}}_{\nu} = {\phi^{\alpha}}_{\beta}g^{\mu\beta}g_{\alpha\nu}$$
Okaaaaaay. Now, for concreteness, let's say $n = 2$. So we proceed to calculate the components ${(\phi^{\text{Ad}})^{\mu}}_{\nu}$
$$\begin{align}
{(\phi^{\text{Ad}})^{1}}_{1}
&= {\phi^{\alpha}}_{\beta}g^{1\beta}g_{\alpha1}\\
&= {\phi^{1}}_{1}g^{11}g_{11} &= {\phi^{1}}_{1}
\end{align}$$
$$\begin{align}
{(\phi^{\text{Ad}})^{1}}_{2}
&= {\phi^{\alpha}}_{\beta}g^{1\beta}g_{\alpha2}\\
&= {\phi^{2}}_{1}g^{11}g_{22} &= {\phi^{2}}_{1}
\end{align}$$
$$\begin{align}
{(\phi^{\text{Ad}})^{2}}_{1}
&= {\phi^{\alpha}}_{\beta}g^{2\beta}g_{\alpha1}\\
&= {\phi^{1}}_{2}g^{22}g_{11} &= {\phi^{1}}_{2}
\end{align}$$
$$\begin{align}
{(\phi^{\text{Ad}})^{2}}_{2}
&= {\phi^{\alpha}}_{\beta}g^{2\beta}g_{\alpha2}\\
&= {\phi^{2}}_{2}g^{22}g_{22} &= {\phi^{2}}_{2}
\end{align}$$
If you write the components ${\phi^{\mu}}_{\nu}$ and ${(\phi^{\text{Ad}})^{\mu}}_{\nu}$ as matrices, you can see that one really is the transpose of the other.
Again, as you can see from the calculations, this only holds in the case of an orthonormal basis. Otherwise, in the summation you would get non-zero off-diagonal terms for the components of the metric, ($g_{12}$ and $g_{21}$) or diagonal terms ($g_{11}$ and $g_{22}$) different of 1, and that of course would destroy this property.
Best Answer
Let us rearrange your $$T'^{\alpha\beta} = \Lambda^\alpha{}_\gamma \Lambda^\beta{}_\delta T^{\gamma \delta},$$ into $$T'^{\alpha\beta} = \Lambda^\alpha{}_\gamma T^{\gamma \delta}\Lambda^\beta{}_\delta,$$ and one step more $$T'^{\alpha\beta} = \Lambda^\alpha{}_\gamma T^{\gamma \delta}(\Lambda^{\top})_\delta{}^\beta{}.$$
In this last equation anyone could clearly see that to get the components of $T'$ one needs to multiply accordingly to $$T'=\Lambda T\Lambda^{\top}.$$