This is a cultural comment, rather than an answer; but it is a bit long for the comment box, which is why I am writing it here.
My comment is the following: giving a connection, and giving the associated parallel transport, are essentially the same thing, and so it should be possible to establish the formula you are trying to prove by reasonably conceptual, high-level, thinking, rather than mucking about too much with complicated differential equations.
Let me elaborate a little: how would you define the derivative of a vector field along a curve? Well, the idea is that you want to form the usual Newton quotient
$$ \dfrac{V\bigl(c(t+\epsilon)\bigr) - V\bigl(c(t)\bigr)}{\epsilon}$$
and then let $\epsilon$ go to zero, to compute the derivative of $V$ at the
point $c(t)$.
The only problem is that the subtraction in this formula doesn't make sense,
because the tangent vectors being subtracted are based at different points:
one is based at $c(t+\epsilon)$ and the other at $c(t)$.
Enter parallel transport: suppose we have an agreed upon way to shift, or transport, tangent
vectors along a curve. Then we can apply this so as to transport
$V\bigl(c(t+\epsilon)\bigr)$ from $c(t+\epsilon)$ to $c(t)$, and then form the
above Newton quotient, and proceed to compute a derivative.
An agreed upon method for shifting vectors along curves is formally referred to as a choice of parallel transport, because if you have such a method, and if you define a vector field along a curve $c(t)$ by choosing some $V_0$ at
$c(t_0)$ and then using the agreed upon parallel transport to define
$V\bigl(c(t))$ by tranporting $V_0$ along $c$ to $c(t)$, you obtain a vector
field along $c(t)$ whose derivative at each point will be zero (by construction!). Thus this vector field does not change along $c(t)$,
and so consists of parallel vectors, hence the name parallel tranport.
(Of course, the notions of change and parallel are not intrinsict to
the vector field; they depend upon our particular choice of parallel transport.)
Of course, our choice of parallel transport should satisfy some axioms (it should be linear; it should be smooth (so that the vector fields along curves
that it gives rise to are smooth); and so on). If you look in the right text,
you will find these axioms written down.
As we've seen, a choice of parallel transport on our manifold gives a way
of differentiating vector fields along curves. But since a tangent vector
is just an infinitesimal curve, one sees that we in fact have a way of
differentiating a vector field in the direction of a given vector field: extend that vector field to a curve, and then differentiate along that curve.
Thus a choice of parallel transport determines an affine connection, i.e. a way
of differentiating vector fields along tangent vectors. Conversely, this is enough information to determine the parallel transport: we parallel transport a tangent vector along a curve by making sure that its derivative (using the given affine connection) at every point of the curve, in the tangent direction of the curve, vanishes.
So we have gone full circle, from parallel transport, to affine connection, back to parallel transport. The text you are reading is spelling out the second half of this circle, but is perhaps a little scanty on details regarding the first half; in fact, the exercise you are trying to solve is exactly about the first half of the circle, and its goal is to verify that you truly are going around in a circle: i.e. that you are ending up where you started from.
There really is something to verify here (i.e. the exercise is not trivial if you have not done it before, and are learning these ideas for the first time), but I hope that the above discussion may help shed some light on its meaning,
and also make it seem less daunting (and perhaps more conceptual and less computational) than it might otherwise appear.
This is an excellent question. As indicated by the MathOverflow link in the comments, there are many ways to think about torsion and torsion-freeness. At the risk of being repetitive, allow me to summarize some of these, adding my own thoughts.
Throughout, we let $M$ be a smooth manifold, $\nabla$ a connection on $TM$, and $$T^\nabla(X,Y) = \nabla_XY - \nabla_YX - [X,Y]$$
its torsion tensor field. We let $X$, $Y$ denote vector fields.
Initial Observations
(1) Parallel coordinates
Torsion (at a point) can be seen as the obstruction to the existence of parallel coordinates (at that point):
Fact: Let $p \in M$. Then $T^\nabla|_p = 0$ if and only if there exists a coordinate system $(x^i)$ centered at $p$ such that $\nabla \partial_i |_p = 0$.
The point here is that if $T^\nabla = 0$, then any parallel frame is commuting (i.e.: $\nabla E_i = 0$ $\forall i$ $\implies$ $[E_i, E_j] = 0$ $\forall i,j$), hence is a coordinate frame (by the "Flowbox Coordinate Theorem").
(2) Commuting of second partials
The following two facts indicate that torsion can be thought of as the obstruction to (certain types of) second partial derivatives commuting.
For a smooth function $f \colon M \to \mathbb{R}$, recall that its covariant Hessian (or second covariant derivative) is the covariant $2$-tensor field defined by
$$\text{Hess}(f) := \nabla \nabla f = \nabla df.$$
Explicitly, $\text{Hess}(f)(X,Y) = (\nabla_X df)(Y) = X(Yf) - (\nabla_XY)(f)$.
Fact [Lee]: The following are equivalent:
(i) $T^\nabla = 0$
(ii) The Christoffel symbols of $\nabla$ with respect to any coordinate system are symmetric: $$\Gamma^k_{ij} = \Gamma^k_{ji}$$
(iii) The covariant Hessian of any smooth function $f$ is symmetric: $$\text{Hess}(f)(X,Y) = \text{Hess}(f)(Y,X)$$
Torsion-freeness also implies another kind of symmetry of second partials:
Symmetry Lemma [Lee]: If $T^\nabla = 0$, then for every smooth family of curves $\Gamma \colon (-\epsilon, \epsilon) \times [a,b] \to M$, we have
$$\frac{D}{ds} \frac{d}{dt} \Gamma(s,t) = \frac{D}{dt} \frac{d}{ds} \Gamma(s,t).$$
I don't know for certain whether the converse to the Symmetry Lemma is true, but I imagine it is.
Some Heuristic Interpretations
(i) "Twisting" of parallel vector fields along geodesics
Suppose we have a connection $\nabla$ on $\mathbb{R}^n$ whose geodesics are lines, but that has torsion. One could then imagine that parallel translating a vector along a line results in the vector "spinning" along the line, as if one were holding each end of a string and rolling it between our fingers.
An explicit example of such a connection is in the MathOverflow answer linked in the comments.
The justification for why this interpretation should be believed in general will be discussed below in (B).
On the MO thread, Igor Belegradek points out two related facts:
Fact [Spivak]:
(1) Two connections $\nabla^1$, $\nabla^2$ on $TM$ are equal if and only if they have the same geodesics and torsion tensors.
(2) For every connection on $TM$, there is a unique torsion-free connection with the same geodesics.
(ii) Closing of geodesic parallelograms (to second order)
Let $v, w \in T_pM$ be tangent vectors. Let $\gamma_v$ and $\gamma_w$ be the geodesics whose initial tangent vectors are $v$, $w$, respectively. Consider parallel translating the vector $w$ along $\gamma_v$, and also the vector $v$ along $\gamma_w$. Then the tips of the resulting two vectors agree to second order if and only if $T^\nabla|_p = 0$.
Heuristic reasons for this (and a picture!) are given in this excellent answer by Sepideh Bakhoda.
A precise proof of this fact is outlined by Robert Bryant at the end of this MO answer of his.
More Reasons We Like $T^\nabla = 0$
(A) Submanifolds of $\mathbb{R}^N$ come with torsion-free connections
Suppose $(M,g)$ is isometrically immersed into $\mathbb{R}^N$.
As hinted in the comments, the euclidean connection $\overline{\nabla}$ on $\mathbb{R}^N$ is torsion-free. It is a fact that the tangential component of $\overline{\nabla} = \nabla^\top + \nabla^\perp$ defines an induced connection on $M \subset \mathbb{R}^N$. This induced connection on $M$ will then also be torsion-free (and compatible with the induced metric).
Point: If $(M,g) \subset \mathbb{R}^N$ is an isometrically immersed submanifold, then its induced connection is torsion-free.
This example is more general than it seems: by the Nash Embedding Theorem, every Riemannian manifold $(M,g)$ can be isometrically embedded in some $\mathbb{R}^N$.
(B) $T = d^\nabla(\text{Id})$
[I'll add this another time.]
(C) Simplification of identities
Finally, I should mention that $T^\nabla = 0$ greatly simplifies many identities.
First, we have the Ricci Formula
$$\nabla^2_{X,Y}Z - \nabla^2_{Y,X}Z = R(X,Y)Z - \nabla_{T^\nabla(X,Y)}Z.$$
Thus, in the case where $T^\nabla = 0$, we can interpret the curvature $R(X,Y)$ as the obstruction to commuting second covariant derivatives of vector fields.
In the presence of torsion, the First and Second Bianchi Idenities read, respectively,
$$\mathfrak{S}(R(X,Y)Z) = \mathfrak{S}[ T(T(X,Y),Z) + (\nabla_XT)(Y,Z)],$$
$$\mathfrak{S}[(\nabla_XR)(Y,Z) + R(T(X,Y),Z)] = 0,$$
where $\mathfrak{S}$ denotes the cyclic sum over $X,Y,Z$.
References
[Lee] "Riemannian Manifolds: An Introduction to Curvature"
[Spivak] "A Comprehensive Introduction to Differential Geometry: Volume II"
Best Answer
You can probably make your idea work but it won't be easy. The reason is that your formula that recovers the connection from the parallel transport is true not only for the Levi-Civita connection but also for arbitrary connections. This means that in order to identify the right hand side as the Levi-Civita connection, you will need to understand what makes the parallel transport of the Levi-Civita connection special compared with the parallel transport of a general connection. The compatibility with the metric is easy - this implies that parallel transport is an isometry. However, to understand how the symmetry affects the parallel transport is much more delicate (see here for example).
A much less painful way to solve the exercise is to use the notion of a parallel frame along $c$ (which if I remember correctly is introduced in one of the other exercises). Namely, pick some basis of $\xi_1(p), \dots, \xi_n(p)$ of $T_pM$ and extend it by parallel transport to a frame $(\xi_1, \dots, \xi_n)$ of vectors fields along $c$. Now, write the restriction of $Y$ to $c(t)$ as $Y = Y^i(t) \xi_i(c(t))$ (summation convention is in use) and note that
$$ (\nabla_X Y)(c(t)) = \frac{DY(c(t))}{dt} = \dot{Y}^i(t) \xi_i(c(t)) + Y^i(t) \frac{D\xi_i(t)}{dt} = \dot{Y}^i(t) \xi_i (c(t)) $$
which means that the covariant derivative relative to the frame $\xi_i$ is given simply by the regular derivative. Then,
$$ \frac{d}{dt} \left( P_{c,t_0, t}^{-1}(Y(c(t)) \right)|_{t = t_0} = \frac{d}{dt} \left( P_{c,t_0, t}^{-1}(Y^i(t) \xi_i(c(t))) \right)|_{t = t_0} \\ = \frac{d}{dt} \left( Y^i(t) \xi_i(p) \right)|_{t = t_0} = \dot{Y}^i(t_0) \xi_i(p) = (\nabla_X Y)(p).$$