The covariant derivative for a general tensor of the form $T^{a_1\dots a_n}_{b_1 \dots b_n}$ is given by,
$$\nabla_c T^{a_1\dots a_n}_{b_1 \dots b_n} = \partial_c T^{a_1\dots a_n}_{b_1 \dots b_n} + \Gamma^{a_1}_{cd}T^{d\dots a_n}_{b_1 \dots b_n} + \dots - \Gamma^d_{c b_1}T^{a_1\dots a_n}_{d \dots b_n} - \dots$$
Taking the covariant derivative of a covariant field $V_a$, we find,
$$\nabla_b V_a = \partial_b V_a - \Gamma^c_{ba}V_c$$
Now, the object $\nabla_b V_a$ has two lower indices, so taking the covariant derivative again, we find,
$$\nabla_c (\nabla_b V_a) = \partial_c(\nabla_b V_a) - \Gamma^d_{cb} (\nabla_d V_a) - \Gamma^d_{ca}(\nabla_b V_d)$$
Inserting the original covariant derivative, we find explicitly,
$$\nabla_c (\nabla_b V_a) = \partial_c (\partial_b V_a -\Gamma^{e}_{ba}V_e) - \Gamma^d_{cb}(\partial_d V_a - \Gamma^e_{da}V_e) - \Gamma^d_{ca}(\partial_b V_d - \Gamma^e_{bd}V_e)$$
The quote you give from Carroll about the covariant derivative is right: it quantifies the rate of change of a tensor field relative to parallel transport. The covariant derivative of a tensor at a point doesn't make sense. However, the commutator of covariant derivatives acting on a point does.
The situation is analogous to the vector field commutator. Earlier in Carroll, you read that given two vector fields $X$ and $Y$, the composition $XY$ is not a vector field, but the combination
$$[X, Y] = XY-YX$$
is, because the nontensorial parts cancel out.
Now, heuristically, two covariant derivatives acting on a vector field gives
$$\nabla_\mu \nabla_\nu V^\rho(x) = V^\rho(x + \epsilon(\hat{\mu} + \hat{\nu})) - \text{parallel transport of } V^\rho(x) \text{ along } \hat{\nu} \text{ first}$$
where I'm being a bit sloppy with $\epsilon$'s. This doesn't make sense if $V^\rho$ is a single vector, but compare this to the opposite ordering of covariant derivatives,
$$\nabla_\nu \nabla_\mu V^\rho(x) = V^\rho(x + \epsilon(\hat{\mu} + \hat{\nu})) - \text{parallel transport of } V^\rho(x) \text{ along } \hat{\mu} \text{ first}.$$
If we subtract these two expressions, the dependence on $V^\rho(x + \epsilon(\hat{\mu} + \hat{\nu}))$ cancels out, giving a result that only depends on the single vector $V^\rho(x)$.
You can see this happening in Carroll too. In his computation of the components of the Riemann tensor in Eq. (3.111), the derivatives acting on $V^\rho$ itself drop out.
Best Answer
The covariant derivative is not zero in general. Take for example the plane polar coordinates ($r,\theta$). Of course the curvature tensor is zero for that manifold. But the covariant derivative of a vector $\mathbf V = f(r,\theta)\mathbf e_r + g(r,\theta)\mathbf e_{\theta}$ with respect to $r$ is: $$\frac{d\mathbf V}{dr} = \frac{\partial f}{\partial r}\mathbf e_r + f \frac{\partial \mathbf e_r}{\partial r} + \frac{\partial g}{\partial r}\mathbf e_{\theta} + g\frac{\partial \mathbf e_{\theta}}{\partial r}$$
The unit vector $\mathbf e_r$ doesn't change when changing $r$, and $$\frac{\partial \mathbf e_{\theta}}{\partial r} = \frac{1}{r}\mathbf e_{\theta}$$
So, $$\frac{d\mathbf V}{dr} = \frac{\partial f}{\partial r}\mathbf e_r + \left(\frac{\partial g}{\partial r} + \frac{1}{r}\right)\mathbf e_{\theta}$$ It is clearly not zero. The derivative of the vector shown above is: $$lim_{\Delta r \to 0}\frac{\mathbf V(P+\Delta P) - \mathbf V(P)}{\Delta r}$$The difference in the numerator can be visualized taking the arrow in $P$, putting it at $P+\Delta P$, and getting the difference between them. That displacement operation is the parallel transport, which is trivial in this case because there is no doubt of a parallel of a line at a point outside that line. In a curved space, this parallelism must be defined.
If we take the derivative of the expression above with respect to $\theta$, the result will be equal to what is obtained by making the derivative with respect to $\theta$ before, and then with respect to $r$. That shows the zero curvature.