I'll say a few words about how I think about covariant derivatives, which is really just expanding on janmarqz's comment (hopefully others will contribute their own viewpoints as well):
For me, the most important geometric idea behind a covariant derivative $\nabla$ is that given a curve $\gamma$ in a manifold $M$, $\nabla$ gives you an isomorphism between the tangent spaces $T_{\gamma(t_1)}M$ and $T_{\gamma(t_2)}M$ for any two points on the curve. Mathematically, this isomorphism
$$
P : T_{\gamma(t_1)}M \to T_{\gamma(t_2)}M
$$
is the unique isomorphism with the property that for any $v \in T_{\gamma(t_1)}M$, there exists a vector field (which I'll call $v(t)$) along $\gamma$ such that $v(t_1) = v, v(t_2) = P(v)$, and $\nabla_{\gamma'(t)} v(t) = 0$ for all $t \in [t_1, t_2]$.
This isomorphism is called "parallel transport"; I like to picture a surface embedded in $\mathbb{R}^3$, such as the 2-sphere, and think of parallel transport along a curve $\gamma$ as "dragging" vectors along that curve. (Important remark: the isomorphism obtained depends on the choice of curve $\gamma$ in general.)
Of course, once you have an isomorphism of vector spaces, you get an isomorphism of any of the associated tensor spaces as well. So if $T$ is a $(k,l)$-tensor on $T_{\gamma(t_1)}M$, then we get a $(k,l)$-tensor $PT$ on $T_{\gamma(t_2)}M$.
Now the point is that once you have this "parallel transport" isomorphism, the covariant derivative $\nabla_X \mathcal{T}$ is a literal derivative in the following precise sense: Given a vector $X \in T_pM$, let $\gamma$ be any curve with $\gamma'(0) = X$, and let $P_t$ be the "parallel transport along $\gamma$" isomorphism
$$
P_t : T_{\gamma(t)}M \to T_{\gamma(0)}M \quad (= T_pM).
$$
Then for any tensor field $\mathcal{T}$ on $M$,
$$
\nabla_X \mathcal{T} = \frac{d}{dt}\Big|_{t=0} \Big( P_t \big( \mathcal{T}(\gamma(t)) \big) \Big).
$$
This is a very precise interpretation of the idea that $\nabla_X \mathcal{T}$ gives you the derivative of $\mathcal{T}$ in the direction of $X$.
$$\newcommand\ee{\vec{e}}
\newcommand\vv{\vec{v}}
\newcommand\XX{\vec{X}}
$$
Let me start by clarifying your example.
Let $S$ be the unit circle embedded in the plane with the usual parameterization $\phi(\theta)$, and let $\XX$ be a vector field on $S$.
This means that each $\XX(\theta)$ is in the tangent space of $\phi(\theta)$, which is the one-dimensional space spanned by $(-\sin\theta, \cos\theta) = \vec{\phi'}(\theta)$.
Let's first motivate what we mean by covariant derivative. So far, $\XX$ is a nice map from $[0,2\pi)$ to the tangent lines of $S$.
In general, its ordinary derivative $\XX'$ is going to be in $\mathbb{R}^2$.
We can write $\XX(\theta) = f(\theta)\vec{\phi'}(\theta)$, where $f$ is a nice real-valued function. Then $\XX'(\theta) = f'(\theta)\vec{\phi'}(\theta)+f(\theta)\vec{\phi''}(\theta)$,
and we note that $\mathbb{R}^2$ is spanned by $\vec{\phi'}(\theta)$ and $\vec{\phi''}(\theta)$.
When we want to study $S$ intrinsically, this differentiation is not good enough, because it can give us information which lies outside the tangent lines.
So we instead take its projection. Let $\pi(\theta)$ be the projection onto the subspace spanned by $\vec{\phi'}(\theta)$,
and instead consider $\pi(\theta) \circ \XX'(\theta)$. We call this new function $\nabla_\theta \XX$; it tells us only about the part of the derivative of $X$ which lies along $S$.
We now turn to the problem of parallel transport. Suppose now that $\XX(0)$ is a tangent vector at $\phi(0)$.
We want to roll it along $S$ to get a tangent vector $\XX(\theta)$ at $\phi(\theta)$ that is in some sense equivalent.
Now clearly we can take $\XX(\theta) = |\XX(0)|\vec{\phi'}(\theta)$, but it is worth elaborating on the machinery behind this intuitive operation.
The key here is that our rolling is in some sense maximally intrinsic. At every step, the intrinsic part of the vector does not change.
To formalize this, we say that $\nabla_\theta \XX(\theta) = 0$.
This defines the parallel transport of $\XX(0)$.
What if instead of starting with an intrinsic derivative, we started with a notion of parallel transport? Can we recover an intrinsic derivative?
Suppose that $\psi(\theta)$ is the map that gives the parallel transport of tangent vectors at $\phi(0)$ to the tangent space at $\phi(\theta)$. This is in fact linear.
Let $\XX(\theta)$ be a vector field on $S$.
We want to recover the intrinsic component of the infinitecimal change of $\XX$ at $\theta=0$.
To do that, let $\delta>0$ be some small change in $\theta$. Can we recover the intrinsic change of $\XX(\delta)$ from $\XX(0)$?
Well, we know what $\XX(\delta)$ should look like if there is no intrinsic change at all: it is just the parallel transport $\psi(\delta)(\XX(0))$.
So we recover the intrinsic change of $\XX(\delta)$ as the difference between $\XX(\delta)$ and the parallel transport of $\XX(0)$.
That is, we recover the covariant derivative as $$\nabla_\theta \XX(0) = \lim_{\delta \to 0} \frac{\XX(\delta)-\psi(\delta)(\XX(0))}{\delta}.$$
Note here that it does not make any sense for the covariant derivative on the circle to be in the $\frac{\partial}{\partial r}$ direction, since that is extrinsic to the circle, and the covariant derivative gives intrinsic information only.
Now we move on to the general case.
Let $M$ be a Riemannian manifold, $g$ its metric, and $\nabla$ its connection.
The torsion-free condition specifies that for any vector fields $X$ and $Y$ on $M$, $\nabla_X\nabla_Y - \nabla_Y\nabla_X = [X,Y]$. Here $[X,Y]$ is the Lie bracket of vector fields.
To work in coordinates, fix a point $p$, and an open neighborhood $U$ of $p$ with coordinate functions $x^i$.
We denote by $\ee_i$ the $i$th tangent vector with respect to these coordinates.
The first thing we note is that $[\ee_i,\ee_j] = 0$ simply by the commutativity of the ordinary partial derivative.
We don't need any Christoffel symbol machinery to then derive that $\nabla_{\ee_j}\ee_i = \nabla_{\ee_j}\ee_i$, it is a straightforward consequence of the torsion-free condition.
Now we define the symbols $\gamma^k_{ij}$ such that $\nabla_{\ee_i}\ee_j = \gamma^k_{ij}\ee_k.$
Note here that the Christoffel symbols are the coefficients of the covariant derivative, not the ordinary derivative. Be careful with notation.
$$\frac{\partial \vec{\mathbf{e_i}}}{\partial x^j}=\Gamma^k_{ij}\vec{\mathbf{e_k}}$$
Let $\vv$ be a vector field, which is given in components as $v^i\ee_i$.
Then we have that $$\nabla_{\ee_i}\vv
= \nabla_{\ee_i}(v^j\ee_j)
= (\nabla_{\ee_i}v^j)\ee_j + v^j(\nabla_{\ee_i}\ee_j)
= \frac{\partial v^j}{\partial x^i}\ee_j + v^j\gamma^k_{ij}\ee_k.$$
Now suppose that $\vv$ is equal to $\ee_l$, so $v^l = 1$ and $v^i=0$ otherwise. We then get that
$\frac{\partial v^i}{\partial x^j} = 0$, and thus that
$$\nabla_{\ee_i}\vv = \gamma^k_{il}\ee_k
= \nabla_{\ee_i}\ee_l.$$
This is a tautology, we have recovered no new information.
Here is where you make the mistake in your derivation.
let's consider the covariant derivative of the covariant basis vector. Observe
$$\nabla_i \vec{\mathbf{e_j}} = \frac{\partial \vec{\mathbf{e_j}}}{ \partial x^i } - \Gamma^k_{ij} \vec{\mathbf{e_k}}$$
You have put here a minus instead of a plus in the right-hand side, which should read: $$= \frac{\partial \vec{\mathbf{e_j}}}{ \partial x^i } + \Gamma^k_{ij} \vec{\mathbf{e_k}}$$
Fixing this in the following steps, and using the corrected definition of Christoffel symbols, you would get:
$$ \nabla_i \vec{\mathbf{e_j}} = \frac{\partial \vec{\mathbf{e_j}}}{ \partial x^i } + \Gamma^k_{ij} \vec{\mathbf{e_k}} = \Gamma^k_{ij} \vec{\mathbf{e_k}},$$
which implies the correct result that $$\frac{\partial \vec{\mathbf{e_j}}}{ \partial x^i } = 0.$$
In general, it is true that the partial derivatives of $\ee_i$ vanish, but the covariant derivatives do not. The Christoffel symbols measure precisely by how much these differ.
Best Answer
There is a very intuitive way to understand the covariant derivative (for the Levi-Civita connection) for the special case of isometrically embedded submanifolds in $\mathbb R^n$. Roughly speaking, first take the usual derivative in $\mathbb R^n$, and then project the answer onto the tangent plane of the submanifold. The image of the projection is then the covariant derivative.
In fact, this is how the concept of covariant derivative if often introduced to undergraduates in their first differential geometry course. For instance, if you consider the sphere $S^2 \subset \mathbb R^3$, then it is easy (both analytically and visually) to prove that the tangent vector field to a great circle has zero covariant derivative and therefore that great circles must be geodesics.