Using the product rule for differentiation
$$ \nabla_{X} R(Y,Z)W + \nabla_{Y} R(Z,X)W + \nabla_{Z} R(X,Y)W
\\= \nabla_X(R(Y,Z)W) + \nabla_Y(R(Z,X)W) + \nabla_Z(R(X,Y)W)
\\- R(\nabla_X Y,Z)W - R(Y,\nabla_X Z)W
\\- R(\nabla_Y Z,X)W - R(Z,\nabla_Y X)W
\\- R(\nabla_Z X,Y)W - R(X,\nabla_Z Y)W
\\- R(Y,Z)\nabla_X W - R(Z,X)\nabla_Y W - R(X,Y)\nabla_Z W
$$
Also
$$R(\nabla_X Y,Z) + R(Y,\nabla_X Z) + R(\nabla_Y Z,X) + R(Z,\nabla_Y X) + R(\nabla_Z X,Y) + R(X,\nabla_Z Y)
\\= R(\nabla_X Y - \nabla_Y X,Z) + R(\nabla_Y Z - \nabla_Z Y,X) + R(\nabla_Z X - \nabla_X Z,Y)
\\= R(\tau(X,Y) + [X,Y],Z) + R(\tau(Y,Z) + [Y,Z],X) + R(\tau(Z,X) + [Z,X],Y)
$$
And
$$ \nabla_X(R(Y,Z)W) + \nabla_Y(R(Z,X)W) + \nabla_Z(R(X,Y)W)
\\= \nabla_X \nabla_Y \nabla_Z W - \nabla_X \nabla_Z \nabla_Y W
+ \nabla_Y \nabla_Z \nabla_X W - \nabla_Y \nabla_X \nabla_Z W
+ \nabla_Z \nabla_X \nabla_Y W - \nabla_Z \nabla_Y \nabla_X W
\\ - \nabla_X \nabla_{[Y,Z]} W - \nabla_Y \nabla_{[Z,X]} W - \nabla_Z \nabla_{[X,Y]} W
\\= \nabla_Y \nabla_Z \nabla_X W - \nabla_Z \nabla_Y \nabla_X W
+ \nabla_Z \nabla_X \nabla_Y W - \nabla_X \nabla_Z \nabla_Y W
+ \nabla_X \nabla_Y \nabla_Z W- \nabla_Y \nabla_X \nabla_Z W
\\ - \nabla_{[Y,Z]} \nabla_X W - R(X,[Y,Z]) - \nabla_{[X,[Y,Z]]}W - \nabla_{[Z,X]} \nabla_Y W - R(Y,[Z,X]) - \nabla_{[Y,[Z,X]]}W - \nabla_{[X,Y]} \nabla_Z W - R(Z,[X,Y]) - \nabla_{[Z,[X,Y]]}W
\\= R(Y,Z)\nabla_X W + R(Z,X)\nabla_Y W + R(X,Y)\nabla_Z W
\\ - R(X,[Y,Z])W - R(Y,[Z,X])W - R(Z,[X,Y])W
$$
noting that $\nabla_{[X,[Y,Z]]}W + \nabla_{[Y,[Z,X]]}W + \nabla_{[Z,[X,Y]]}W = \nabla_{[X,[Y,Z]] + [Y,[Z,X]] + [Z,[X,Y]]}W = 0$ by the Jacobi identity.
Now combine to get
$$ \nabla_{X} R(Y,Z) + \nabla_{Y} R(Z,X) + \nabla_{Z} R(X,Y)
= R(X,\tau(Y,Z)) + R(Y,\tau(Z,X)) + R(Z,\tau(X,Y))$$
Note the argument is simpler if one has that all the Lie brackets are zero, which is a much weaker assumption than normal coordinates, etc. I would say that the core of the proof is this identity:
$$ \nabla_X \nabla_Y \nabla_Z W - \nabla_X \nabla_Z \nabla_Y W
+ \nabla_Y \nabla_Z \nabla_X W - \nabla_Y \nabla_X \nabla_Z W
+ \nabla_Z \nabla_X \nabla_Y W - \nabla_Z \nabla_Y \nabla_X W
\\= \nabla_Y \nabla_Z \nabla_X W - \nabla_Z \nabla_Y \nabla_X W
+ \nabla_Z \nabla_X \nabla_Y W - \nabla_X \nabla_Z \nabla_Y W
+ \nabla_X \nabla_Y \nabla_Z W- \nabla_Y \nabla_X \nabla_Z W
$$
Finally let me comment that the problem with using normal coordinates is that you implicitly use that the connection comes from a Riemannian metric. The proofs given here are simple manipulations of the definitions.
Best Answer
The divergence of a vector field $V = V^a \partial _a$ on a (pseudo-)Riemannian manifold is given by $\operatorname{div} V = V^a{}_{;a}$. In words, this is obtained by taking the trace of the total covariant derivative. In the special case of $\mathbb R^n$ with a flat Euclidean or pseudo-Euclidean metric, this yields the usual calculus formula for the divergence.
By extension, it is common to define the divergence of an arbitrary tensor field as the trace of its total covariant derivative on (usually) the last two indices. So if $G$ is the (contravariant) Einstein tensor, then its divergence would be the vector field $\operatorname{div} G = G^{ba}{}_{;a} \partial_b$. Because $G$ is symmetric, this is also equal to $G^{ab}{}_{;a}\partial _b$.
You can also apply this to the covariant Einstein tensor with components $G_{ab}$; its divergence is the $1$-form $G_{ba;}{}^{a}dx^b$.