I think that 3 implies 3-alternative but the converse is not true. You can actually view 3 as a definition for $\nabla_XT$, which only needs the action of $\nabla_X$ on vector fields and one-forms. (And the latter is reduced to the action of $\nabla_X$ on vector fields.) The resulting operation can be shown to be compatible with tensor products by a direct computation. In the case of $(0,1)$-tensor fields, this looks as follows. Using 3, you can expand $\nabla_X(T\otimes S)(Y,Z)$ as
$$X((T\otimes S)(Y,Z))-(T\otimes S)(\nabla_XY,Z)-(T\otimes S)(Y,\nabla_XZ)=X(T(Y)S(Z))-T(\nabla_XY)S(Z)-T(Y)S(\nabla_XZ).$$
Using the product rule in the first term and collecting, this reads as
$$(X(T(Y))-T(\nabla_XY))S(Z)+T(Y)(X(S(Z))-S(\nabla_XZ))=((\nabla_XT)(Y))S(Z)+T(Y)((\nabla_XS)(Z)),$$
and this proves the claim.
But in addition to 3-alternative, you need a condition that $\nabla_X$ is compatible with contractions. This also follows from 3 by direct computations. Having that, you can deduce 3 from 3-alternative via using that $T(\omega,\dots,Y,\dots)$ can be obtained as a "complete contraction" (which is a sequence of contractions) from $T\otimes\omega\otimes\dots\otimes Y\otimes\dots$.
Concerning your second questions, the interpretation as mapping $(k,\ell)$ tensor fields to $(k,\ell+1)$ tensor fields needs leaving the slot for the vector field $X$ free. So you define $\nabla T$ as mapping $(\omega_1,\dots,\omega_k,Y_1,\dots,Y_{\ell+1})$ to $(\nabla_{Y_1}T)(\omega_1,\dots,\omega_k,Y_2,\dots,Y_\ell)$. Rule 4 implies that this is indeed a tensor field.
There is no interpretation in terms of $(k+1,\ell)$ tensor fields. While you can form $X\otimes T$ and this is a $(k+1,\ell)$ tensor field, the covariant derivative $\nabla_XT$ cannot be obtained from $X\otimes T$. This is beause for a smooth function $f$, you get $(fX)\otimes T=X\otimes fT$ but $\nabla_X(fT)=X(f)T+f\nabla_XT$ while $\nabla_{fX}T=f\nabla_XT$.
Best Answer
The wording "directional covariant derivative" is not widely used in the literature, but some authors (e.g. Amari, Information Geometry and Its Applications, p. 117) use it, perhaps, to distinguish from the "total covariant derivative" (see e.g. J.M.Lee, Riemannian Manifolds: An Introduction to Curvature, p. 54), which is a tensor $\nabla T$ of a higher rank, given by $$ \nabla T (\omega_1, \dots, \omega_p, V_1, \dots, V_q, X) = \nabla_X T (\omega_1, \dots, \omega_p, V_1, \dots, V_q) $$ As I get it, if $\nabla T$ denotes the total covariant derivative as above, then $\nabla_{X} T$ is the directional covariant derivative of $T$ in the direction of vector field $X$, and $\nabla_X T (\dots) = \nabla T (\dots, X) $.