In what sense is the connection enabling one to compare the vector field at two different points on the manifold [...], when the mapping is from the (Cartesian product of) the set of tangent vector fields to itself? I thought that the connection ∇ "connected" two neighbouring tangent spaces through the notion of parallel transport [...]
To see a connection only as a mapping $\nabla: \mathcal{X}(M)\times\mathcal{X}(M)\rightarrow\mathcal{X}(M)$ is too restrictive. Often a connection is also seen as a map $Y\mapsto\nabla Y\in\Gamma(TM\otimes TM^*)$, which highlights the derivative aspect. However, the important point is that $\nabla$ is $C^\infty(M)$-linear in the first argument which results in the fact that the value $\nabla_X Y|_p$ only depends on $X_p$ in the sense that
$$
X_p=Z_p \Rightarrow \nabla_X Y|_p = \nabla_Z Y|_p.
$$
Hence, for every $v\in TM_p$, $\nabla_vY$ is well-defined. This leads directly to the definition of parallel vector fields and parallel transport (as I think you already know).
Vice versa, given parallel transport maps $\Gamma(\gamma)^t_s: TM_{\gamma(s)}\rightarrow TM_{\gamma(t)}$, one can recover the connection via
$$
\nabla_X Y|p = \frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} \quad(\gamma \text{ is a integral curve of }X).
$$
This is exactly the generalisation of directional derivatives in the sense that we vary $Y$ in direction of $X_p$ in a parallel manner.
In Euclidean space this indeed reduces to the directional derivative: Using the identity chart every vector field can be written as $Y_p=(p,V(p))$ for $V:\mathbb R^n\rightarrow \mathbb R^n$ and the parallel transport is just given by
$$
\Gamma(\gamma)_s^t (\gamma(s),v)=(\gamma(t),v).
$$
Hence, we find in Euclidean space:
$$
\frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} = \frac{d}{dt}\bigg|_{t=0}(p,V(\gamma(t))) = (p,DV\cdot\gamma'(0)),
$$
which is exactly the directional derivative of $V$ in direction $v=\gamma'(0)$.
Back to the original question: I think it is hard to see how a connection "connects neighbouring tangent spaces" only from the axioms. You should keep in mind, however, that the contemporary formalism has passed many abstraction layers since the beginning and is reduced to its core, the axioms (for a survey see also Wikipedia). To get the whole picture, it is essential that one explores all possible interpretations and consequences of the definition, since often they led to the definition in the first place. In my opinion, the connection is defined as it is with the image in mind that it is an infinitesimal version of parallel transport. Starting from this point, properties as the Leibniz rule are a consequence. However, having such a differential operator $\nabla$ fulfilling linearity, Leibniz rule and so on, is fully equivalent to having parallel transport in the first place. In modern mathematics, these properties are thus taken as the defining properties/axioms of a connection, mainly because they are easier to handle and easier to generalise to arbitrary vector bundles.
Given this, what does the quantity $\nabla_{e_\mu}e_\nu=\Gamma^\lambda_{\mu\nu}e_\lambda$ represent? [...]
As you wrote, the connection coefficients / Christoffel symbols $\Gamma^\lambda_{\mu\nu}$ are the components of the connection in a local frame and are needed for explicit computations. I think on this level you can't get much meaning out these coefficients. However, they reappear in a nicer way if you restate everything in the Cartan formalism and study Cartan and/or principal connections. The Wikipedia article on connection forms tries to give an introduction to this approach.
Nahakara also gives an introduction to connections on principal bundles and the relation to gauge theory later on in his book. In my opinion, this chapter is a bit short and could be more detailed, especially to the end. But it is a good start.
Best Answer
There is a lot to be said on the subject, but the least technical point of view (in my opinion) is the following:
Consider first the situation in $\mathbb{R}^n$. Let $X,Y \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$ be vector fields. To define the directional derivative of the vector field $X$ in the direction of the vector field $Y$ at a point $p \in \mathbb{R}^n$, we can mimic usual definition of directional derivative:
$$ (\nabla_Y X)(p) := \lim_{t \to 0} \frac{X(p + tY(p)) - X(p)}{t}. $$
The result $(\nabla_Y X)$ is a vector field on $\mathbb{R}^n$. You can check that the operation $\nabla$ defined as above satisfies the following two properties:
Here, $X,Y \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$ are vector fields and $f \colon \mathbb{R}^n \rightarrow \mathbb{R}$ is a scalar function. The function $Yf$ (at a point $p$) is the directional derivative of $f$ at $p$ in the direction $Y(p)$.
Now let us try and mimic the above construction on a general manifold. Given vector fields $X,Y \in \mathfrak{X}(M)$, we try to use the same formula and define
$$ (\nabla_Y X)(p) := \lim_{t \to 0} \frac{X(p + tY(p)) - X(p)}{t}. $$
However, we see that there are two problems. First, the expression $X(p + tY(p))$ is not defined because we don't have a way of adding a point $p \in M$ to a tangent vector $tY(p) \in T_pM$. This is not so bad because we can actually replace the expression $p + tY(p)$ with any curve "which goes in the direction $Y(p)$" such as the flow $\varphi_t^Y(p)$. The more serious problem is that we need to subtract the tangent vector $X(p) \in T_pM$ from the tangent vector $X(\varphi_t^Y(p)) \in T_{\varphi_t^Y(p)}$ and those are two tangent vectors that belong to different vector spaces. In general, without any extra data, we have no way of identifying tangent spaces at different points of $M$.
To summarize, we see that we can differentiate vector fields along vector fields without any problem on $\mathbb{R}^n$ but we encounter problems when we try and do it on a general manifold. But $\mathbb{R}^n$ is also a manifold so what makes it special? The fact that it is not only a manifold but a vector space and an affine space and so we can add points to vectors and identify tangent spaces at different points using translations. This is something we don't have on a general manifold.
The definition of an affine connection is meant to supply the manifold $M$ "externally" with an operation $\nabla \colon \mathfrak{X}(M) \times \mathfrak{X}(M) \rightarrow \mathfrak{X}(M)$ which satisfies properties $(1)-(2)$ and so allows us to differentiate vector fields along vector fields. That is, instead of defining the directional derivative of a vector field along a vector field, we require that somebody handles us a mechanism $\nabla$ which satisfies the properties that the familiar derivative satisfied on $\mathbb{R}^n$ and then we will think of it as a directional derivative.
Obviously this raises quite a lot of questions. Does such mechanism always exists? (Yes). Is it unique? (No). Is there a natural choice of such differentiation mechanism? (Yes, under certain circumstances). Can we use this mechanism to recover the ability to identify tangent vectors at different points that was necessary to define the regular directional derivative in $\mathbb{R}^n$? (Yes, at least along curves. This leads to the notion of parallel transport). I refer you to the extensive article on the covariant derivative (which is pretty much another name for an affine connection) on wikipedia for further details.