I begin with a brief(ish) review, and then answer your question directly afterward.

**Tangent Vectors**

In your standard differential geometry treatment, given a smooth curve $\gamma:[-1,1]\rightarrow \mathcal M$ with $\mathcal M$ a smooth manifold, we can define the tangent vector $X_{\gamma,p}$ to $\gamma$ at the point $p\equiv \gamma(0)$ to be the linear map which eats a smooth function $f:\mathcal M\rightarrow \mathbb R$ and spits out the $\mathbb R$-number
$$X_{\gamma, p} (f) := (f\circ \gamma)'(0)$$

The tangent space $T_p \mathcal M$ at $p$ is the set of all tangent vectors to smooth curves which pass through $p$. It is not immediately trivial that this set forms a vector space, but it is a standard elementary exercise in differential geometry to prove this fact.

**Expansion in a coordinate-induced basis**

If we have a coordinate chart $(U,x)$ with $x:\mathcal M \rightarrow \mathbb R^n$ a chart map and $U\ni p$, then we may write

$$X_{\gamma,p}(f) = (f\circ \gamma)'(0) = \bigg[\big(f\circ x^{-1}\big)\circ(x\circ \gamma)\bigg]'(0) = \partial_m(f\circ x^{-1})\big|_{x(p)} \cdot (x^m\circ \gamma)'(0)$$

where we make use of the multivariable chain rule. It is standard to define the shorthand $\partial_m(f\circ x^{-1}) \equiv \frac{\partial f}{\partial x^m}$ and $(x^m\circ \gamma)'(0) \equiv X_{\gamma,p}^m$ (i.e. the $m^{th}$ component of the vector $X_{\gamma,p}$), in which case we have

$$ X_{\gamma,p}(f) = X_{\gamma,p}^m \left.\frac{\partial f}{\partial x^m}\right|_{x(p)} \rightarrow X_{\gamma,p} = X_{\gamma,p}^m \left.\frac{\partial}{\partial x^m}\right|_{x(p)}$$
It is in this sense that the partial derivative operators - understood to include evaluation at the coordinates $x(p)$ - constitute a basis for the tangent space $T_p \mathcal M$.

**Vector Fields**

Having defined an individual tangent vector, we can define a vector *field* to be essentially an assignment of a vector to each point. It's important to note that the space of vector **fields** can be understood not only as a vector field over the reals (meaning that vector fields can be added and multiplied by $\mathbb R$-numbers), but in fact as a *module* over the smooth functions (meaning that vector fields can be added and multiplied by smooth functions $\mathcal M\rightarrow \mathbb R$). This will be important momentarily.

Because any coordinate chart $(U,x)$ is a smooth homeomorphism from $U\subseteq \mathcal M$ to $x(U)\subseteq \mathbb R^n$, we can show that that $\big\{\frac{\partial}{\partial x^m}\big\}$ constitutes a set of $\mathrm{dim}(\mathcal M)$ linearly independent vectors at each point of $U$. As a result, one may take *any* vector field $V$ on $U$ and expand it in the coordinate basis as $V = V^m \partial/\partial x^m$.

**Non-holonomic Bases**

Of course, if we want to expand a vector field $V$ on $U\subseteq \mathcal M$ in terms of a basis $\{\hat e_m\}$, we only require that $\{\hat e_m\big|_{p}\}$ constitute a basis (i.e. a linearly independent spanning set) of $T_p\mathcal M$ for each $p\in U$. As mentioned above, the set $\{\partial/\partial x^m\}$ satisfy this condition; any basis which is defined from a coordinate chart in this way is called *holonomic*. One might reasonably ask whether *every* basis is holonomic - that is, whether every basis can be understood as the partial derivatives with respect to some coordinates - and the answer to that question is **no**.

This can be demonstrated easily - consider $\mathbb R^2$ equipped with the standard Cartesian coordinates $(x^1,x^2)$. Because mixed partial derivatives commute, one can see immediately that the commutator $[\partial/\partial x^m,\partial/\partial x^n]$ applied to any smooth function yields zero. Define the basis $$\{\hat e_1,\hat e_2\} = \left\{\frac{\partial}{\partial x^1}, e^{x^1} \frac{\partial}{\partial x^2}\right\}$$
and note that the commutator $[\hat e_1,\hat e_2]$ *doesn't* vanish. Since $\{\hat e_1,\hat e_2\}$ constitutes a valid basis with a non-vanishing commutator, it must be a *non-holonomic basis*. The fact that we can multiply vector fields by *smooth functions* like $e^{x^1}$ (rather than simply by $\mathbb R$-numbers) traces back to the fact that the set of vector fields is not just a vector space over $\mathbb R$, but in fact a module over the ring of smooth functions $\mathcal M\rightarrow \mathbb R$.

First, the Christoffel symbol is defined as $\nabla_{\hat e_\mu} \hat e^\nu = \partial_\mu \hat e_\mu = \Gamma^\alpha_{\mu \nu} \hat e_\alpha$ [...] Here we explicitly have $\partial_\mu$ and $\hat e_\nu$ as different objects, with one acting on the other. Do you expect me to believe that $\partial_{\mu}\partial_{\nu}=\Gamma^{\alpha}_{\mu\nu}\partial_{\alpha}$ or $\hat{e}_{\mu}\hat{e}_{\nu}=\Gamma^{\alpha}_{\mu\nu}\hat{e}_{\alpha}$?

No, because those expressions are incorrect. The definition of the Christoffel symbol is
$$\nabla_{\partial_\mu} \partial_\nu \equiv \Gamma^\alpha_{\mu \nu} \partial_\alpha$$
where for brevity I define $\partial_\mu \equiv \partial/\partial x^\mu$ as per standard convention. This is a *primitive* definition, which defines behavior of the covariant derivative $\nabla$. In particular, it is **not** defined as $\nabla_{\partial_\mu} \partial_\nu = \partial_\mu \partial_\nu$. You may be confused by the fact that $\nabla_{\partial_\mu} f := \partial_\mu f$ for scalar functions $f$, but the covariant derivative acts differently on tangent vectors.

The Christoffel symbols are specifically the connection coefficients of the Levi-Civita connection **in a holonomic basis**. If you want to consider the derivatives of generic vector fields, you must expand them in these bases first. For example, let $\hat e_\mu = a^\alpha_\mu \partial_\alpha$ for some functions $a^\alpha_\mu$; we would then have
$$\nabla_{\hat e_\mu} \hat e_\nu = \big(a^\alpha_\mu \nabla_{\partial_\alpha}\big)a^\beta_\nu \partial_\beta $$
$$= a^\alpha_\mu \partial_\alpha(a^\beta_\nu) \partial_\beta + a^\alpha_\mu a^\beta_\nu \Gamma^\gamma_{\alpha\beta} \partial_\gamma$$

Second, for GA we have $\hat e_\mu \hat e_\nu = g_{\mu\nu} + \hat e_\mu\wedge \hat e_\nu$ which obviously doesn't work for $\hat e_\mu\rightarrow \partial_\mu$.

Sure it does. The expression you have written defines the *geometric product* $\hat e_\mu \star \hat e_\nu$. Once again, this is a primitive definition. I suspect you are confused because you are under the impression that if $\hat e_\mu$ and $\hat e_\nu$ are partial derivatives $\partial_\mu$ and $\partial_\nu$, then $\hat e_\mu \star \hat e_\nu= \partial_\mu \star \partial_\nu$ must be equal to the second partial derivative $\partial_\mu \partial_\nu$, which is not true. I imagine this misconception arises because in geometric algebra we often don't use a special symbol for the geometric product, but we must not confuse the geometric product of two vector (fields) with the *composition* of vector fields when the latter are understood to be differential operators on a manifold.

Third, if a vector field is defined as the space of curves through different points, then the parameters must be equal to the number of dimensions (with one defining the curves, and $d-1$ defining the class of curves to fill all space) [...]

I'm not really sure what this means, as curves are 1D objects which depend on a single parameter. Let $\mathcal M=S^2$, the $2$-sphere consisting of the set
$$S^2 := \left\{(a,b,c) \in \mathbb R^3 \ \bigg| \ a^2+b^2+c^2=1\right\}$$

Consider the curve $\gamma:t \mapsto \big(\cos(t),\sin(t),0\big)\in S^2$. The tangent vector $X_{\gamma,(1,0,0)}$ eats a function $f$ and spits out
$$X_{\gamma,(1,0,0)}(f) = \frac{d}{dt}f\big(\cos(t),\sin(t),0\big)\bigg|_{t=0}$$
If we choose the coordinate chart $(\theta,\phi)$ as usual, we will find that
$$X_{\gamma,(1,0,0)} = \frac{\partial}{\partial \phi}\bigg|_{(\theta,\phi)=(\pi/2,0)}$$

It seems to me things would be much more consistent if when people wrote $v=v^{\mu}\partial_{\mu}$ they actually meant $\nabla_v$, a **scalar** differential operator, not to be confused with the **vector** $v=v^{\mu}\hat{e}_\mu$. Then Lie brackets would make much more sense defined as $\mathcal{L}(v, u)=\nabla_v u - \nabla_u v$ instead of $\mathcal{L}(v, u)=[v, u]$

The point you are overlooking is that tangent vectors - defined as I have done above - exist upstream of any notion of a connection. A covariant derivative $\nabla$ requires addition structure in the form of a connection/parallel transport which enables us to move vectors from one point to another (thus rendering the idea of a difference quotient well-defined), and therefore do not exist on a smooth manifold which does *not* have these structures. This occurs e.g. when we look at Hamiltonian mechanics as dynamics on a symplectic manifold, which is not equipped with a connection.

The power of Lie derivatives arises because they do **not** require the additional technology of a connection. Given a sufficiently well-behaved vector field, we can define a flow by pushing points along its integral curves. This allows us to move points, which allows us to move entire curves; since tangent vectors are defined in terms of smooth curves, then they too can be pushed along a flow, and in this way we can define difference quotients with no need for a connection. It turns out that if $u$ and $v$ are such vector fields, then $\mathscr L_u(v) = [u,v]$ where the vector field commutator is defined in terms of the composition of differential operators. **If** our manifold is equipped with a connection **and** that connection happens to be torsion-free, then it turns out that $\mathscr L_u(v) = \nabla_u v - \nabla_v u$, but note that in general neither of these two conditions are satisfied (see e.g. the torsion tensor).

The *drawback* to the Lie derivative $\mathcal L_u$ is that it requires $u$ to be a (well-behaved) vector **field**. In contrast, the covariant derivative $\nabla_V$ is perfectly happy for $V$ to be a single, solitary *vector*, and it turns out that this extra technology is what is needed in GR. As such, we need to introduce a connection and a covariant derivative.

## Best Answer

An arrow-vector in Euclidean space is essentially a translation operator. Usually Euclidean space is considered to be a vector space itself (eg. points = vectors), but Euclidean space is a linear (vector) space only if you choose an origin, which is an unnecessary structure (in Euclidean space, all points are equal, no reason to pick out one as unique).

A vector space, whose "origin is forgotten" is what is called an affine space. In an affine space, points are not equivalent to vectors. Vectors provide translations of points into other points. I won't state the actual axioms that define an affine space precise, its just I want you to think of points as... well, points, and vectors as translations.

Let $f$ be a smooth function on Euclidean space, and let $a$ be a vector. The point $P$ can be translated by $a$ into $P+a$. The translation acts on the function by an operator $\hat{T}_a$ as $$ (\hat{T}_a[f])(P)=f(P+a). $$ If $\epsilon$ is an "infinitesimal", then $f(P+\epsilon a)$ is expanded as $$ f(P+\epsilon a)=f(P)+\frac{\partial f}{\partial x^\mu}|_{x=P}a^\mu\epsilon+O(\epsilon^2), $$ but $f(P+\epsilon a)=(\hat T_{\epsilon a}[f])(P)$, so the operator that translates by $\epsilon a$ is given as $$ \hat T_{\epsilon a}=1+\epsilon a^\mu\frac{\partial}{\partial x^\mu}. $$

This shows that when you consider a vector as an

infinitesimalarrow, describing aninfinitesimaldisplacement, it is natural to think of this as a differential operator.A manifold is only

infinitesimallylike Euclidean space, hence why this is the primary interpretation of vectors in differential geometry.