"Rolling without slipping" is a powerful idea, but the phrase doesn't necessarily lead one to the intended mental model. In particular,
torsion is something that is at issue only for manifolds of dimension 3 or higher. Perhaps you can imagine taking a 3-manifold, and rolling it along a hyperplane in 4-space --- but
the metaphor becomes strained, partly because most Riemannian 3-manifolds cannot be
smoothly isometrically embedded in $\mathbb E^4$.
Another way to think of it is this: suppose you have a smooth parametrized
curve say in a Riemannian
3-manifold, $\alpha: [0,T] \rightarrow M^3$. Then the claim is that there exists
exists a matching curve $\beta: [0,T] \rightarrow \mathbb E^3$ together with a map
$\phi$ from a neighborhood of the image of $\beta$ to a neighborhood of the image of $\alpha$
that takes the Euclidean metric to the metric of $M^3$ up to first order along the curve.
Furthermore, $\beta$ is uniquely determined up to an isometry of $\mathbb E^3$.
Basically, $\beta$ is what you get if you "roll" $M^3$ along $\mathbb E^3$ along $\gamma$
in a way
that best maintains contact between the two spaces: that is, "without slipping".
To see that the curve $\beta$ (assuming it exists) is uniquely determined by $\alpha$,
we can imagine trying to send a neighborhood of $\beta$ to a neighborhood of another
curve $\gamma: [0,T] \rightarrow \mathbb E^3$ in a way that maintains first order
contact of the metric. If you look at curves parallel to $\beta$ in $\mathbb E^3$, the first
derivative of their arc length
is negative in the direction that $\beta$ is curving. The logarithmic derivative of
arc length, for curves displaced along a normal vector field that remains as parallel to
itself as possible, is the magnitude of the curvature.
If you try to twist the normal coordinate system, this corresponds to the concept
of torsion. It's easiest to visualize along a straight line: if you twist a neighborhood
of a line in space,
you distort the metric on each concentric cylinder, by changing the angles between
cross-section circles and generating lines. I.e., threads that wind around a hose
at angles $\pm \pi/4$ are effective at preventing twisting (= torsion). The same
principle holds for any curve in space: the first order behavior of the metric in a
neighborhood of the curve locks in the Frenet characterization of the curve (well,
the curvature and torsion as a function of arc length, but these
are different from but related to the curvature and torsion of a connection).
Why does the matching curve exist? You can check derviatives etc. but
better to just imagine it. Basically, you could reparametrize $\alpha$ by arc length,
then project a neighborhood of $\alpha$ back to $\alpha$ by sending each point to the closest
point of $\alpha$, and parametrize the lines of projection by their arc length.
On each concentric tube, there's a unique unit vector field orthogonal to the preimages
of projection to $\alpha$. Scale this vector field so that it commutes with
projection, to get a full set of
cylindrical coordinates for a neighborhood of $\alpha$. The only first-order
invariant for the metric that is free is the first derivative of scaling function.
Using that, you can match the first derivative
by using the curvature and torsion of a curve in space.
This process defines the affine connection on the tangent bundle. The Levi-Civita
connection is the linear part of the affine connection, which is automatically by
definition torsion free. The non-torsion-free connections are ones that
impart twists on little neighborhoods of curves. This is usually expressed
by translating it into a formula about covariant derivatives of two vector fields
not being as commutative as it should be.
This really calls for pictures. Any volunteers?
NB: I'm combining my previous comments into an answer, because I believe that this is better than leaving them scattered.
As another commenter has pointed out, the skew-symmetric part of the Ricci tensor is the obstruction to there being a $\nabla$-parallel volume form in the first place. To see this, consider the first Bianchi identity: $R^i_{jkl}+R^i_{klj}+R^i_{ljk}=0$. Set $i=j$ and sum to get $R^i_{ikl}+R^i_{kli}+R^i_{lik}=0$, which becomes $R^i_{ikl}=R^i_{kil}-R^i_{lik}$. Now $\Omega = \frac12 R^i_{ikl}\ dx^k\wedge dx^l$ is the curvature of the connection induced by $\nabla$ on the top exterior power of the cotangent bundle, and $\frac12(R^i_{kil}{-} R^i_{lik})dx^k\wedge dx^l$ is the skew-symmetric part of the Ricci tensor. Thus, the vanishing of the skew-symmetric part of Ricci is equivalent to the flatness of this induced connection on the top exterior power.
Assume now that the Ricci curvature is symmetric, so that there is a (local) $\nabla$-parallel volume form, say, $\Upsilon$. Then the Ricci curvature has the following interpretation: Let $\exp_p:T_pM\to M$ be the exponential map of $\nabla$ based at $p$. Then
$$
\exp^\ast_p(\Upsilon)=(1 - \tfrac13 R_{ij} x^ix^j + \cdots)\ dx^1\wedge dx^2\wedge\cdots\wedge dx^n,
$$
where $\exp^\ast_p\bigl(\mathrm{Ric}(\nabla)\bigr)_p = R_{ij}\, dx^idx^j$. (Here, the $x^i$ are any linear coordinates on $T_pM$ centered at $0_p$ that are $\Upsilon$-unimodular at $0_p$.) Thus, Ric gives the deviation of the parallel volume form from the exponentially flat one. (This makes sense, even though you can't define 'geodesic balls' without a metric. You still compare the volume of open neighborhoods of $p$ with respect to the two 'natural' volume forms.)
Best Answer
Here is my attempt to present the intuition behind torsion in an accessible way. Here is a similar, previous thread on MathOverflow.
In your question, you've described torsion in terms of its effect on parallel-transporting a vector along two different paths. The distinction between curvature and torsion may be more transparent if you think about scalars rather than vectors. Curvature effects vanish when you operate on a scalar, e.g., the mass of a hydrogen atom doesn't end up being different depending on which path you transport it along. But the covariant derivative does pick up an effect from the torsion when you compute the commutator of two derivatives acting on a scalar; the reason is that you're differentiating along two coordinate axes, and if there is torsion these axes themselves rotate as you move along.
Another nice way to distinguish between curvature and torsion is that nonvanishing torsion requires that the space have a detectable handedness to it, whereas curvature has no such handedness. E.g., in two dimensions, a bug living on a surface can never use measurements of curvature in the way we would use a magnetic compass to find north. In a real-world physical context, the experiment described at the end of 1 is looking for violations of the symmetry between left- and right-handedness.