The Leibniz rule indeed holds even in this case.
One of the quick ways to see that is to recall that the cross product can be equivalently defined by
$$
A \times B := (A^{\flat} \wedge B^{\flat})^{\sharp}
$$
Assuming that the covariant derivative works as expected we may write the following lines
$$
\begin{align}
\nabla_{v}(A \times B) &= (\nabla_{v}A^{\flat} \wedge B^{\flat} + A^{\flat} \wedge \nabla_{v}B^{\flat})^{\sharp} \\
&= (\nabla_{v}A^{\flat} \wedge B^{\flat})^{\sharp} + A^{\flat} \wedge \nabla_{v}B^{\flat})^{\sharp} \\
&= \nabla_{v}A \times B + A \times \nabla_{v}B
\end{align}
$$
If we look at this calculation more carefully we will observe that there are different covariant derivatives that are involved there! The normal connection acts on the normal fileds such as $A \times B$ here, while the "intrinsic" covariant derivative acts on the tangential fields $A$ and $B$.
We should have adorned our $\nabla$-s with some marks to distinguish them with regards to the bundle they act on, but this is quite customary in differential geometry to use the same $\nabla$ for all bundles involved in calculations provided the reader knows where the sections are taken from.
In fact, if one diligently writes down all the definitions it will be clearly visible that the coordinate presentations of the operations strikingly differ.
In what sense is the connection enabling one to compare the vector field at two different points on the manifold [...], when the mapping is from the (Cartesian product of) the set of tangent vector fields to itself? I thought that the connection ∇ "connected" two neighbouring tangent spaces through the notion of parallel transport [...]
To see a connection only as a mapping $\nabla: \mathcal{X}(M)\times\mathcal{X}(M)\rightarrow\mathcal{X}(M)$ is too restrictive. Often a connection is also seen as a map $Y\mapsto\nabla Y\in\Gamma(TM\otimes TM^*)$, which highlights the derivative aspect. However, the important point is that $\nabla$ is $C^\infty(M)$-linear in the first argument which results in the fact that the value $\nabla_X Y|_p$ only depends on $X_p$ in the sense that
$$
X_p=Z_p \Rightarrow \nabla_X Y|_p = \nabla_Z Y|_p.
$$
Hence, for every $v\in TM_p$, $\nabla_vY$ is well-defined. This leads directly to the definition of parallel vector fields and parallel transport (as I think you already know).
Vice versa, given parallel transport maps $\Gamma(\gamma)^t_s: TM_{\gamma(s)}\rightarrow TM_{\gamma(t)}$, one can recover the connection via
$$
\nabla_X Y|p = \frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} \quad(\gamma \text{ is a integral curve of }X).
$$
This is exactly the generalisation of directional derivatives in the sense that we vary $Y$ in direction of $X_p$ in a parallel manner.
In Euclidean space this indeed reduces to the directional derivative: Using the identity chart every vector field can be written as $Y_p=(p,V(p))$ for $V:\mathbb R^n\rightarrow \mathbb R^n$ and the parallel transport is just given by
$$
\Gamma(\gamma)_s^t (\gamma(s),v)=(\gamma(t),v).
$$
Hence, we find in Euclidean space:
$$
\frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} = \frac{d}{dt}\bigg|_{t=0}(p,V(\gamma(t))) = (p,DV\cdot\gamma'(0)),
$$
which is exactly the directional derivative of $V$ in direction $v=\gamma'(0)$.
Back to the original question: I think it is hard to see how a connection "connects neighbouring tangent spaces" only from the axioms. You should keep in mind, however, that the contemporary formalism has passed many abstraction layers since the beginning and is reduced to its core, the axioms (for a survey see also Wikipedia). To get the whole picture, it is essential that one explores all possible interpretations and consequences of the definition, since often they led to the definition in the first place. In my opinion, the connection is defined as it is with the image in mind that it is an infinitesimal version of parallel transport. Starting from this point, properties as the Leibniz rule are a consequence. However, having such a differential operator $\nabla$ fulfilling linearity, Leibniz rule and so on, is fully equivalent to having parallel transport in the first place. In modern mathematics, these properties are thus taken as the defining properties/axioms of a connection, mainly because they are easier to handle and easier to generalise to arbitrary vector bundles.
Given this, what does the quantity $\nabla_{e_\mu}e_\nu=\Gamma^\lambda_{\mu\nu}e_\lambda$ represent? [...]
As you wrote, the connection coefficients / Christoffel symbols $\Gamma^\lambda_{\mu\nu}$ are the components of the connection in a local frame and are needed for explicit computations. I think on this level you can't get much meaning out these coefficients. However, they reappear in a nicer way if you restate everything in the Cartan formalism and study Cartan and/or principal connections. The Wikipedia article on connection forms tries to give an introduction to this approach.
Nahakara also gives an introduction to connections on principal bundles and the relation to gauge theory later on in his book. In my opinion, this chapter is a bit short and could be more detailed, especially to the end. But it is a good start.
Best Answer
Yes, $\nabla_v (a\wedge b)=(\nabla_v a)\wedge b+a\wedge(\nabla_v b)$ for any $v\in TM$ and $a,b\in\mathcal{T}^{\bullet,0}M$. This is the Leibniz rule for tensor product, projected down to the alternating parts. See, for example, this question.
(The $(-1)^k$ sign for exterior derivative comes from "moving $v$ pass other vectors" when impose the constraint $v$ also participate in the antisymmetry.)