The notes' definition of "vector manifold" in a geometric algebra $V$ is just as a subset $W\subset V$, but since he goes on to talk about smooth paths and tangent vectors, I expect that he means a smoothly embedded submanifold of $V$.
The Whitney embedding theorem guarantees that any smooth manifold can be smoothly embedded into Euclidean space of sufficiently high dimension. There are geometric algebras of arbitrarily high dimension (the Clifford algebra $Cl_n(\mathbb{R})$ is a vector space of dimension $2^n$). So any smooth manifold can be realized as a vector manifold.
The traditional abstract approach to defining manifolds is already automatically coordinate-free, and makes no reference to any ambient space. However, defining manifolds in ambient Euclidean space is equivalent to the coordinate-free approach - for example, standard textbooks such as Differential Topology (Guillemin-Pollack) define and work with manifolds entirely in ambient space.
In what sense is the connection enabling one to compare the vector field at two different points on the manifold [...], when the mapping is from the (Cartesian product of) the set of tangent vector fields to itself? I thought that the connection ∇ "connected" two neighbouring tangent spaces through the notion of parallel transport [...]
To see a connection only as a mapping $\nabla: \mathcal{X}(M)\times\mathcal{X}(M)\rightarrow\mathcal{X}(M)$ is too restrictive. Often a connection is also seen as a map $Y\mapsto\nabla Y\in\Gamma(TM\otimes TM^*)$, which highlights the derivative aspect. However, the important point is that $\nabla$ is $C^\infty(M)$-linear in the first argument which results in the fact that the value $\nabla_X Y|_p$ only depends on $X_p$ in the sense that
$$
X_p=Z_p \Rightarrow \nabla_X Y|_p = \nabla_Z Y|_p.
$$
Hence, for every $v\in TM_p$, $\nabla_vY$ is well-defined. This leads directly to the definition of parallel vector fields and parallel transport (as I think you already know).
Vice versa, given parallel transport maps $\Gamma(\gamma)^t_s: TM_{\gamma(s)}\rightarrow TM_{\gamma(t)}$, one can recover the connection via
$$
\nabla_X Y|p = \frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} \quad(\gamma \text{ is a integral curve of }X).
$$
This is exactly the generalisation of directional derivatives in the sense that we vary $Y$ in direction of $X_p$ in a parallel manner.
In Euclidean space this indeed reduces to the directional derivative: Using the identity chart every vector field can be written as $Y_p=(p,V(p))$ for $V:\mathbb R^n\rightarrow \mathbb R^n$ and the parallel transport is just given by
$$
\Gamma(\gamma)_s^t (\gamma(s),v)=(\gamma(t),v).
$$
Hence, we find in Euclidean space:
$$
\frac{d}{dt}\bigg|_{t=0}\Gamma(\gamma)_t^0Y_{\gamma(t)} = \frac{d}{dt}\bigg|_{t=0}(p,V(\gamma(t))) = (p,DV\cdot\gamma'(0)),
$$
which is exactly the directional derivative of $V$ in direction $v=\gamma'(0)$.
Back to the original question: I think it is hard to see how a connection "connects neighbouring tangent spaces" only from the axioms. You should keep in mind, however, that the contemporary formalism has passed many abstraction layers since the beginning and is reduced to its core, the axioms (for a survey see also Wikipedia). To get the whole picture, it is essential that one explores all possible interpretations and consequences of the definition, since often they led to the definition in the first place. In my opinion, the connection is defined as it is with the image in mind that it is an infinitesimal version of parallel transport. Starting from this point, properties as the Leibniz rule are a consequence. However, having such a differential operator $\nabla$ fulfilling linearity, Leibniz rule and so on, is fully equivalent to having parallel transport in the first place. In modern mathematics, these properties are thus taken as the defining properties/axioms of a connection, mainly because they are easier to handle and easier to generalise to arbitrary vector bundles.
Given this, what does the quantity $\nabla_{e_\mu}e_\nu=\Gamma^\lambda_{\mu\nu}e_\lambda$ represent? [...]
As you wrote, the connection coefficients / Christoffel symbols $\Gamma^\lambda_{\mu\nu}$ are the components of the connection in a local frame and are needed for explicit computations. I think on this level you can't get much meaning out these coefficients. However, they reappear in a nicer way if you restate everything in the Cartan formalism and study Cartan and/or principal connections. The Wikipedia article on connection forms tries to give an introduction to this approach.
Nahakara also gives an introduction to connections on principal bundles and the relation to gauge theory later on in his book. In my opinion, this chapter is a bit short and could be more detailed, especially to the end. But it is a good start.
Best Answer
I think your current intuition is pretty good, let me see if I can make it a bit more precise while still keeping things conceptual. I'm not sure how you defined tangent vectors in your class but one way of doing this is as follows.
Pick a point $x \in M$. A tangent vector at $x$ is an equivalence class of curves $\gamma : (-1,1) \to M$ satisfying $\gamma(0) = x$ under the equivalence relation that they are tangent at $x$. The definition of "tangent at $x$" actually requires one to pick a chart on a neighborhood of $x$ and then transform into the statement that the gradients at $0$ are equal in $R^n$; it turns out that this is chart independent and give a real equivalence relation. So a tangent vector really does feel like an arrow in $M$ with this definition. Also interpreting a vector field as a function taking each $x \in M$ to a tangent vector at $x$ seems pretty reasonable. Recall we refer to the set of all tangent vectors at $x \in M$ as the tangent space $T_xM$ and the collection of all tangent spaces form a new manifold $TM$ called the tangent bundle (lots of grunt work goes into proving this).
Out of this nice geometric picture you can also see how to define a "differential operator" from a vector field $v$. Work pointwise in $M$. Given any differentiable function $f : M \to R$ you can operate on the function $f$ with the tangent vector in $v(x) \in T_xM$ by taking a representative curve $\gamma : (-1,1) \to M$, making the composition $f \circ \gamma : (-1,1) \to R$ and taking the derivative at zero $(f \circ \gamma)^\prime(0)$ (this gives you a number). Now one has to show that this depends only on the equivalence class of $\gamma$ (i.e. depends only on $v(x)$) and in fact defines a smooth function $M \to R$ as $x$ varies over $M$. One can work in charts to validate that the operation we've defined satisfies the product rule etc. So in this way the vector field $v$ give a differential operator. Note that all of this boils down to looking at how a single tangent vector allows us to assign a number to a function; this way of thinking of a tangent vector is usually described by calling it a derivation.
It turns out you can start just the idea of differentiation/derivations and define your tangent space from that (which is what I am guessing your class did). Doing so has certain advantages such as avoiding charts and equivalence classes. Also, if the notion of a derivation is interpreted appropriately the construction via differentiation has a certain generality because it is purely "algebraic".