Very briefly. The line of reasoning is the following: the acceleration $A^{\mu}$ is GR is
rather formal construction, it is the covariant derivative of the speed
$U^{\mu}$ with respect to some natural parameter $\lambda$ which parameterize
a trajectory. For massive particles you can choose this parameter to be proper
time $d\tau$ thus $A^{\mu}=DU^{\mu}/d\tau$, although it is not possible for
massless particles, for which $d\tau=0$.
Therefore your question is related to the following one: what is $DU^{\mu}$? It
turns out that the simplest (and only) way to construct the covariant
differential of a vector field is to compare two infinitesimally separated
vectors in the same point (it is essential), e.g., the vector $U^{\mu}\left(
x^{\alpha}+dx^{\alpha}\right) $ and the vector $U^{\mu}\left( x^{\alpha
}\right) $ which should be subject to a parallel translation to the point
$x^{\alpha}+dx^{\alpha}$. After the parallel translation we obtain a new
infinitisimally close vector $U^{\mu\prime}=U^{\mu}\left( x^{\alpha}\right)
+\delta U^{\mu}$, thus
$$
DU^{\mu}=U^{\mu}\left( x^{\alpha}+dx^{\alpha}\right) -U^{\mu\prime}=dU^{\mu
}-\delta U^{\mu},
$$
where $dU^{\mu}=U^{\mu}\left( x^{\alpha}+dx^{\alpha}\right) -U^{\mu}\left(
x^{\alpha}\right) $ is the ordinary differential. Therefore, the small addition
$\delta U^{\mu}$ is the result of parallel translation. There are two obvious
properties of $\delta U^{\mu}$: it should be linear in $U^{\mu}$ and should
vanish with $dx^{\mu}\rightarrow0$. Therefore one can represent $\delta
U^{\mu}$ as follows:
$$
\delta U^{\mu}=-\Gamma_{\alpha\beta}^{\mu}U^{\alpha}dx^{\beta},
$$
where $\Gamma_{\alpha\beta}^{\mu}$ is the set of some matrices usually referred
as “connection coefficients” or “Christoffel symbols”.
Using $\Gamma$, one can generalize the covariant differential $D$ to any tensor quantities.
Although, there are no additional mathematical requirements on $\Gamma$, there
are physical ones in GR — the equivalence principle requires that $\Gamma$
should be symmetric $\Gamma_{\alpha\beta}^{\mu}=\Gamma_{\beta\alpha}^{\mu}$
and $Dg_{\alpha\beta}=0$. The last condition results in
$$
\partial_{\mu}g_{\alpha\beta}=-\left( \Gamma_{\mu,\beta\alpha}+\Gamma
_{\beta,\mu\alpha}\right) ,
$$
where $\Gamma_{\mu,\beta\alpha}=g_{\mu\rho}\Gamma_{\beta\alpha}^{\rho}$. Using
the condition that $\Gamma$ is symmetric one can find:
$$
\Gamma_{\beta\alpha}^{\rho}=\frac{1}{2}g^{\rho\sigma}\left( \partial_{\beta
}g_{\sigma\alpha}+\partial_{\alpha}g_{\sigma\beta}-\partial_{\sigma}
g_{\alpha\beta}\right).
$$
Let's now consider a parameterized trajectory $x^{\mu}\left( \lambda\right)$, the contravariant vector called speed is $U^{\mu}=dx^{\mu}\left(\lambda\right)/d\lambda$, therefore the contravariant acceleration takes the
form:
$$
A^{\mu}=\frac{DU^{\mu}}{d\lambda}=\frac{dU^{\mu}}{d\lambda}-\frac{\delta
U^{\mu}}{d\lambda}=\frac{dU^{\mu}}{d\lambda}+\Gamma_{\alpha\beta}^{\mu
}U^{\alpha}U^{\beta}.
$$
If we choose $d\lambda=d\tau$ then in the locally-inertal frame ($\Gamma=0$)
for the trajectory $x^{\mu}\left( \lambda\right) $ the acceleration $A^{\mu
}$ coincides with the ordinary one $\left( 0,\mathbf{a}\right) $.
And vice versa, $DU^{\mu}=0$ means that $U^{\mu}$ is constant in the
locally-internal frame although it does imply that it is constant in any
other frame, in fact $dU^{\mu}=\delta U^{\mu}$ implies that a free-fall
trajectory is actually a parallel translation in GR, which (for an external
observer) looks like the action of gravitational forces.
It took me a couple of days to figure it out, but I can now answer my own question:
The problem with directly substituting the expression for $\delta g_{\mu\nu}$ into that for ${\delta \Gamma^\alpha}_{\beta\mu}$ is that there is no obvious source for the term $\xi^\sigma \partial_\sigma { \Gamma^\alpha}_{\beta\mu}$ that has to be the leading term in the Lie derivative of the connection. We can however proceed as follows
$$
\delta {\Gamma^{\alpha}}_{\beta\mu}= \frac 12 g^{\alpha\lambda}(\nabla_\beta(\nabla_\lambda \xi_\mu+\nabla_\mu\xi_\lambda)+ \nabla_\mu(\nabla_\beta \xi_\lambda+\nabla_\lambda\xi_\beta) -\nabla_\lambda(\nabla_\beta \xi_\mu+\nabla_\mu\xi_\beta))
$$
$$
=\frac 12 (\nabla_\beta\nabla_\mu+\nabla_\mu\nabla_\beta)\xi^\alpha +\frac 12 g^{\lambda\alpha}([\nabla_\beta,\nabla_\lambda]\xi_\mu + [\nabla_\mu,\nabla_\lambda]\xi_\beta)
$$
$$
=\frac 12 (\nabla_\beta\nabla_\mu+\nabla_\mu\nabla_\beta)\xi^\alpha +\frac 12 g^{\lambda\alpha}(-{R^\sigma}_{\mu\beta\lambda}- {R^\sigma}_{\beta\mu\lambda})\xi_\sigma
$$
$$
=\frac 12 (\nabla_\beta\nabla_\mu+\nabla_\mu\nabla_\beta)\xi^\alpha +\frac 12 \xi^\sigma({R^\alpha}_{\beta\sigma \mu}+ {R^\alpha}_{\mu\sigma \beta} )
$$
$$
=\xi^\sigma \partial_\sigma {\Gamma^{\alpha}}_{\beta\mu} +(\partial_\beta \xi^\lambda ){\Gamma^\alpha}_{\lambda\mu}+(\partial_\mu\xi^\lambda) {\Gamma^\alpha}_{\beta \lambda}- (\partial_\lambda \xi^\alpha) {\Gamma^{\lambda }}_{\beta\mu}+ \partial^2_{\beta\mu} \xi^\alpha
$$
$$
\equiv {({\mathcal L}_\xi \Gamma)^{\alpha}}_{\beta\mu}
$$
In passing from the third line to the fourth we have used some symmetries of the Riemann tensor. We have then (somewhat tediously) substituted the usual expression for the curvature and the covariant derivatives in the last line. I end up with almost the same expression as Bardeen and Zumino, but with a sign difference in the inhomogenous part of the Christoffel symbol transformation
The net result is that first transforming the metric under the diffeomorphism and then computing the connection gets the same result as first computing the connection and then transforming it under a diffeomorphism. This had to be true of course, but there are many places where one can get signs wrong...!
Best Answer
It doesn't. The covariant derivative is a map from $(k,l)$ tensors to $(k,l+1)$ tensors that satisfies certain basic properties. As such it cannot act on anything except tensors. The collection of components $\Gamma^a_{bc}$ does not constitute a tensor.
If you got to this expression via something like $$ \nabla_d(\nabla_b A^a) = \nabla_d(\partial_b A^a) + \nabla_d(\Gamma^a_{bc} A^c), \tag{not recommended} $$ the problem is evaluating from the inside out. In order to express $\nabla_d$ in terms of partial derivatives and connection coefficients, you should imagine it acting on some arbitrary tensor with components $T_b{}^a$ first, and then later substitute $T_b{}^a = \nabla_b A^a$: $$ \nabla_d(\nabla_b A^a) = \partial_d(\nabla_b A^a) + \Gamma^a_{de} \nabla_b A^e - \Gamma^e_{db} \nabla_e A^a. $$
Now it turns out you'll get the same 6 or 8 terms fully expanded if you work the other way, treating $\Gamma^a_{bc} A^c$ as a (2,2) tensor and just applying the rules for covariant differentiation of such a thing, but I'm not sure this always works, and certainly the intermediate steps don't have any natural geometric interpretation.