@Prahar is right, the variation of the Christoffel symbol is a tensor, even if the Christoffel itself is not. We have
$\delta \Gamma^\rho_{\mu\nu}=\frac{1}{2}\delta\bigg(g^{\rho\alpha}(2\partial_{(\mu}g_{\nu)\alpha}-\partial_\alpha g_{\mu\nu})\bigg)=\frac{1}{2}\delta g^{\rho\alpha}(2\partial_{(\mu}g_{\nu)\alpha}-\partial_\alpha g_{\mu\nu})+ \frac{1}{2}g^{\rho\alpha}(2\partial_{(\mu}\delta g_{\nu)\alpha}-\partial_\alpha \delta g_{\mu\nu})$
where $A_{(\mu\nu)}=\frac{1}{2}(A_{\mu\nu}+A_{\nu\mu})$. Using $\delta g^{\rho\alpha}=-g^{\rho\gamma}g^{\alpha\delta}\delta g_{\gamma\delta}$ we have:
$\delta \Gamma^\rho_{\mu\nu}=\frac{1}{2}g^{\rho\alpha}(2\partial_{(\mu}\delta g_{\nu)\alpha}-\partial_\alpha \delta g_{\mu\nu}-2\Gamma_{\mu\nu}^\beta\delta g_{\alpha\beta})$
The Christoffel then combines nicely with the standard derivative to give a covariant tensor (the other Christoffel symbols cancel each other)
$\delta \Gamma^\rho_{\mu\nu}=\frac{1}{2}g^{\rho\alpha}(2\nabla_{(\mu}\delta g_{\nu)\alpha}-\nabla_\alpha \delta g_{\mu\nu})$.
So to answer the original question, we finally have:
$\nabla_\mu V_\nu=\nabla_\mu \delta V_\nu-\frac{1}{2}g^{\rho\alpha}(2\nabla_{(\mu}\delta g_{\nu)\alpha}-\nabla_\alpha \delta g_{\mu\nu})A_\rho$
Remember that we did not assume anything on $V_\mu$. Depending on the problem, it is then possible to integrate by parts to isolate $\delta g_{\mu\nu}$ and obtain the energy momentum tensor.
All formulas you shown above are using abstract index notation except the third formula which is fully expressed is a basis.
For a vector field, you can write for example
$$V = V^\mu e_\mu\;,$$
where $V^\mu$ is a scalar while $e_\mu$ is a vector basis. This is a sort of confusion because in abstract index notation we view $V^\mu$ as a vector field.
When we take the covariant derivative, it reads
$$\nabla_\mu V=\nabla_\mu (V^\nu e_\nu) = \nabla_\mu (V^\nu) e_\nu + V^\nu \nabla_\mu ( e_\nu)$$
\begin{eqnarray}
&=& \partial_\mu (V^\nu) e_\nu + V^\nu \Gamma_\mu{}^\lambda{}_\nu e_\lambda\;,\\
&=&\big( \partial_\mu V^\nu +\Gamma_\mu{}^\nu{}_\lambda V^\lambda\big)e_\nu
\end{eqnarray}
If we define
$$\nabla_\mu V =: (\nabla_\mu V^\nu) e_\nu $$
we will have the relation in the abstract index notation
$$\nabla_\mu V^\nu = \partial_\mu V^\nu +\Gamma_\mu{}^\nu{}_\lambda V^\lambda\;.$$
(More general, you can start with $\nabla V$ and then define $\nabla V =:(\nabla_\mu V^\nu) e^\mu \otimes e_\nu$ )
Next, the metric $g$, it is (o,2) tensor so it has two slots for inserting 2 vectors if we insert the basis into these slots we will get a component of the metric tensor which is a scalar field
$$g(e_\mu, e_\nu)=g_{\mu\nu}$$
($g= g_{\alpha\beta} e^\alpha \otimes e^\beta,\;g(e_\mu, e_\nu) =g_{\alpha\beta} e^\alpha(e_\mu) \otimes e^\beta(e_\nu) =g_{\alpha\beta}\delta^\alpha_\mu \delta^\beta_\nu= g_{\mu\nu} $)
It is also usually define that $\eta(A,B):= A\cdot B$, $\eta$ is a Minkowskian metric
$A\cdot B$ is a scalar so invariants under coordinate transformations
$$A\cdot B =\eta(A,B) \equiv \eta_{IJ} A^I B^J$$
$$= g(A,B) \equiv g_{\mu\nu} A^\mu B^\nu$$
where $A^I = e^I_\mu A^\mu$ for some scalar $e^I_\mu$ (a vierbein), and you can easily prove that $g_{\mu\nu} = \eta_{IJ} e^I_\mu e^J_\nu$.
So now we have
$$e_\mu \cdot e_\nu =\eta_{IJ}e^I_\mu e^J_\nu= g_{\mu\nu}$$
In this step, we can view $\eta_{IJ},g_{\mu\nu}$ as the scalar fields $e^I_\mu$ as a vector field
\begin{eqnarray}
\nabla_\gamma g_{\alpha\beta} (= \partial _\gamma g_{\alpha\beta})&=& \nabla_\gamma (e_\alpha\cdot e_\beta) \\
&=&\eta(\nabla_\gamma e_\alpha,e_\beta) +\eta(e_\alpha,\nabla_\gamma e_\beta)
\equiv \eta_{IJ}\nabla_\gamma(e^I_\alpha) e^J_\beta + \eta_{IJ}e^I_\alpha \nabla_\gamma(e^J_\beta) \\
&=&\eta(\Gamma_\gamma{}^\rho{}_\alpha e_\rho,e_\beta) +\eta(e_\alpha,\Gamma_\gamma{}^\sigma{}_\beta e_\sigma)
\equiv \eta_{IJ}\Gamma_\gamma{}^\rho{}_\alpha e^I_\rho e^J_\beta + \eta_{IJ}e^I_\alpha \Gamma_\gamma{}^\sigma{}_\beta e^J_\sigma \\
&=&\Gamma_\gamma{}^\rho{}_\alpha \eta( e_\rho,e_\beta) + \Gamma_\gamma{}^\sigma{}_\beta\eta(e_\alpha, e_\sigma)
\equiv \Gamma_\gamma{}^\rho{}_\alpha \eta_{IJ} e^I_\rho e^J_\beta + \Gamma_\gamma{}^\sigma{}_\beta \eta_{IJ}e^I_\alpha e^J_\sigma \\
&=& \Gamma_\gamma{}^\rho{}_\alpha e_\rho \cdot e_\beta + \Gamma_\gamma{}^\sigma{}_\beta e_\alpha \cdot e_\sigma \equiv \Gamma_\gamma{}^\rho{}_\alpha g_{\rho \beta} + \Gamma_\gamma{}^\sigma{}_\beta g_{\alpha \sigma}
\end{eqnarray}
Note: Not fully detailed as much as possible but may be helpful for you.
Best Answer
It took me a couple of days to figure it out, but I can now answer my own question:
The problem with directly substituting the expression for $\delta g_{\mu\nu}$ into that for ${\delta \Gamma^\alpha}_{\beta\mu}$ is that there is no obvious source for the term $\xi^\sigma \partial_\sigma { \Gamma^\alpha}_{\beta\mu}$ that has to be the leading term in the Lie derivative of the connection. We can however proceed as follows $$ \delta {\Gamma^{\alpha}}_{\beta\mu}= \frac 12 g^{\alpha\lambda}(\nabla_\beta(\nabla_\lambda \xi_\mu+\nabla_\mu\xi_\lambda)+ \nabla_\mu(\nabla_\beta \xi_\lambda+\nabla_\lambda\xi_\beta) -\nabla_\lambda(\nabla_\beta \xi_\mu+\nabla_\mu\xi_\beta)) $$ $$ =\frac 12 (\nabla_\beta\nabla_\mu+\nabla_\mu\nabla_\beta)\xi^\alpha +\frac 12 g^{\lambda\alpha}([\nabla_\beta,\nabla_\lambda]\xi_\mu + [\nabla_\mu,\nabla_\lambda]\xi_\beta) $$ $$ =\frac 12 (\nabla_\beta\nabla_\mu+\nabla_\mu\nabla_\beta)\xi^\alpha +\frac 12 g^{\lambda\alpha}(-{R^\sigma}_{\mu\beta\lambda}- {R^\sigma}_{\beta\mu\lambda})\xi_\sigma $$ $$ =\frac 12 (\nabla_\beta\nabla_\mu+\nabla_\mu\nabla_\beta)\xi^\alpha +\frac 12 \xi^\sigma({R^\alpha}_{\beta\sigma \mu}+ {R^\alpha}_{\mu\sigma \beta} ) $$
$$ =\xi^\sigma \partial_\sigma {\Gamma^{\alpha}}_{\beta\mu} +(\partial_\beta \xi^\lambda ){\Gamma^\alpha}_{\lambda\mu}+(\partial_\mu\xi^\lambda) {\Gamma^\alpha}_{\beta \lambda}- (\partial_\lambda \xi^\alpha) {\Gamma^{\lambda }}_{\beta\mu}+ \partial^2_{\beta\mu} \xi^\alpha $$ $$ \equiv {({\mathcal L}_\xi \Gamma)^{\alpha}}_{\beta\mu} $$ In passing from the third line to the fourth we have used some symmetries of the Riemann tensor. We have then (somewhat tediously) substituted the usual expression for the curvature and the covariant derivatives in the last line. I end up with almost the same expression as Bardeen and Zumino, but with a sign difference in the inhomogenous part of the Christoffel symbol transformation
The net result is that first transforming the metric under the diffeomorphism and then computing the connection gets the same result as first computing the connection and then transforming it under a diffeomorphism. This had to be true of course, but there are many places where one can get signs wrong...!