In general, if $T$ is an $(r, s)$ tensor field, then $\nabla T$ is an $(r, s+1)$-tensor field given by
$$(\nabla T)(\eta_1, \dots, \eta_r, X, Y_1, \dots, Y_s) = (\nabla_XT)(\eta_1, \dots, \eta_r, Y_1, \dots, Y_s)$$
where the latter expression is intrinsically defined by the equation
\begin{align*}
\nabla_X(T(\eta_1, \dots, \eta_r, Y_1, \dots, Y_s)) =&\ (\nabla_XT)(\eta_1, \dots, \eta_r, Y_1, \dots, Y_s)\\
&+ \sum_{i=1}^rT(\eta_1, \dots, \eta_{i-1}, \nabla_X\eta_i, \eta_{i+1}, \dots, \eta_r, Y_1, \dots, Y_s)\\
&+ \sum_{j=1}^sT(\eta_1, \dots, \eta_r, Y_1, \dots, Y_{j-1}, \nabla_XY_j, Y_{j+1}, \dots, Y_s).
\end{align*}
In particular, if $T$ is a $(1, 1)$ tensor, then $(\nabla_XT)(\eta, Y)$ is intrisically defined by the equation
$$\nabla_X(T(\eta, Y)) = (\nabla_XT)(\eta, Y) + T(\nabla_X\eta, Y) + T(\eta, \nabla_XY)$$
and therefore the correct expression for $(\nabla_XT)(\eta, Y)$ is
$$(\nabla_XT)(\eta, Y) = \nabla_X(T(\eta, Y)) - T(\nabla_X\eta, Y) - T(\eta, \nabla_XY).$$
You can formalize it as such:
Given a "derivative operator" $\nabla$ (I will refer to it as a connection), and a curve $\gamma:\mathbb R\rightarrow M,\ \lambda\mapsto \gamma(\lambda)$, there exists a unique map $$ \frac{D}{d\lambda}:\Gamma(TM,\gamma)\rightarrow\Gamma(TM,\gamma),\ v\mapsto \frac{Dv}{d\lambda}, $$ mapping vector fields along $\gamma$ to vector fields along $\gamma$ (the notation $\Gamma(TM,\gamma)$ refers to sections of $TM$ along $\gamma$, smooth maps of the form $v:\mathbb R\rightarrow TM$ such that $\pi\circ v=\gamma$ with $\pi$ being the projection to the base - vector fields along a curve essentially) satisfying the following.
Linearity: $D/d\lambda$ is $\mathbb R$-linear.
Product rule: For any $\alpha\in C^\infty(\mathbb R)$, we have $$ \frac{D}{d\lambda}(\alpha v)=\frac{d\alpha}{d\lambda}v+\alpha\frac{Dv}{d\lambda}.$$
Consistency: Let $V\in\Gamma(TM)$ be any smooth vector field, let $t=\frac{d}{d\lambda}$ be the tangent vector field of the curve $\gamma$. We then define $v=V\circ\gamma$, with $v$ being a vector field along $\gamma$ now. Then $$ \frac{Dv}{d\lambda}=(\nabla_tV)\circ\gamma, $$ where $\nabla_t$ is in Wald's notation $t^a\nabla_a$ (sorry for breaking the index notation, but it is quite counterintuitive here).
This may be proven quite easily, and is often proven in differential geometry books.
Physics books often (and as far as I remember, Wald's is not really an exception to this most of the time) omit any discussions on locality, and even when not, things are often not proven explicitly.
The point is that one can show that if the $D/d\lambda$ operator exists, then it is local in the sense that let $\lambda_0$ be a point in the domain of $\gamma$, and let $v$ and $\tilde{v}$ be two vector fields along $\gamma$ that agree on an open interval around $\lambda_0$ (say, $(\lambda_0-\epsilon,\lambda_0+\epsilon)$), then we have $$ \frac{Dv}{d\lambda}(\lambda_0)=\frac{D\tilde v}{d\lambda}(\lambda_0). $$
However any smooth curve is going to be non-selfintersecting when restricted to near a point. Then, by locality, you can construct $Dv/dt(\lambda_0)$ as $(\nabla_t V)(\gamma(\lambda_0))$ and extend it to all points $\lambda$.
Wald was being somewhat handwave-y here, and he just skipped this construction.
EDIT: Rereading my post, I realize I haven't been the most clear, especially near the end, so I'll say some additional things.
1)
The validity of the expression in the (3) statement of your theorem is a direct consequence of my $D/d\lambda$. We can use locality to calculate things in basis expansions. Let $\partial_\mu$ denote the coordinate basis fields, and let $e_\mu=\partial_\mu\circ\gamma$ be the corresponding vector fields along $\gamma$. Due to locality and the above definition, we have for any $v$ along $\gamma$: $$ \frac{D}{d\lambda}v=\frac{D}{d\lambda}(v^\mu e_\mu)=\frac{dv^\mu}{d\lambda}e_\mu+v^\mu\frac{D}{d\lambda}e_\mu=\frac{d v^\mu}{d\lambda}e_\mu+v^\mu(\nabla_t\partial_\mu)\circ\gamma=\frac{dv^\mu}{d\lambda}{e_\mu}+v^\mu(\Gamma^\nu_{\sigma\mu}\circ\gamma)t^\sigma(\partial_\nu\circ\gamma)=\left(\frac{dv^\mu}{d\lambda}+t^\nu(\Gamma^\mu_{\nu\sigma}\circ\gamma)v^\sigma\right)e_\mu. $$
2)
Strictly speaking, the expression (2) $$ \partial_a v^b+\Gamma^b_{ac}v^c $$ is not valid for arbitrary $v$ along $\gamma$, due to the above mentioned self-intersections. However, the equation for parallel transport is a local differential equation, hence, you can consider it restricted to a sufficiently small open interval in the curve's domain.
Then the curve will no longer be self-intersecting, and one may show that a (non-unique) open extension of $v$ exists (however the result will be extension-independent). For that open extension it is valid that $$ \frac{Dv^a}{d\lambda}=t^b\partial_b v^a+t^b\Gamma^a_{bc}v^c. $$
Best Answer
Let $\{\partial/\partial y^\nu\}$ be another coordinate frame. Denote by $\widetilde{T}$ (with the indices) the components of the tensor field $T$ with respect to this coordinate system. Basically, you're asking if given an index $\sigma$, would we have $$\frac{\partial (T^{a_1\cdots a_k}_{b_1\cdots b_\ell})}{\partial x^\sigma}= \frac{\partial (\widetilde{T}^{a_1\cdots a_k}_{b_1\cdots b_\ell})}{\partial y^\sigma}?$$The general answer is no, in view of the transformation law for tensors, which say that the above terms differ by something generated by the product rule and the derivatives of the coordinate change. Just to see how bad is the error, take a $(1,1)$-tensor: $$\begin{align} \frac{\partial \widetilde{T}^a_b}{\partial y^\sigma} &= \frac{\partial}{\partial y^\sigma}\left(\sum_{i,j} \frac{\partial y^a}{\partial x^i} \frac{\partial x^j}{\partial y^b} T^i_j \right) \\ &= \sum_k \frac{\partial x^k}{\partial y^\sigma} \frac{\partial}{\partial x^k}\left(\sum_{i,j} \frac{\partial y^a}{\partial x^i} \frac{\partial x^j}{\partial y^b} T^i_j \right) \\ &= \sum_{i,j,k} \frac{\partial x^k}{\partial y^\sigma}\frac{\partial y^a}{\partial x^i}\frac{\partial x^j}{\partial y^b} {\color{blue}{\frac{\partial T^i_j}{\partial x^k}}} + \mbox{even more trash}.\end{align}$$