This is a cultural comment, rather than an answer; but it is a bit long for the comment box, which is why I am writing it here.
My comment is the following: giving a connection, and giving the associated parallel transport, are essentially the same thing, and so it should be possible to establish the formula you are trying to prove by reasonably conceptual, high-level, thinking, rather than mucking about too much with complicated differential equations.
Let me elaborate a little: how would you define the derivative of a vector field along a curve? Well, the idea is that you want to form the usual Newton quotient
$$ \dfrac{V\bigl(c(t+\epsilon)\bigr) - V\bigl(c(t)\bigr)}{\epsilon}$$
and then let $\epsilon$ go to zero, to compute the derivative of $V$ at the
point $c(t)$.
The only problem is that the subtraction in this formula doesn't make sense,
because the tangent vectors being subtracted are based at different points:
one is based at $c(t+\epsilon)$ and the other at $c(t)$.
Enter parallel transport: suppose we have an agreed upon way to shift, or transport, tangent
vectors along a curve. Then we can apply this so as to transport
$V\bigl(c(t+\epsilon)\bigr)$ from $c(t+\epsilon)$ to $c(t)$, and then form the
above Newton quotient, and proceed to compute a derivative.
An agreed upon method for shifting vectors along curves is formally referred to as a choice of parallel transport, because if you have such a method, and if you define a vector field along a curve $c(t)$ by choosing some $V_0$ at
$c(t_0)$ and then using the agreed upon parallel transport to define
$V\bigl(c(t))$ by tranporting $V_0$ along $c$ to $c(t)$, you obtain a vector
field along $c(t)$ whose derivative at each point will be zero (by construction!). Thus this vector field does not change along $c(t)$,
and so consists of parallel vectors, hence the name parallel tranport.
(Of course, the notions of change and parallel are not intrinsict to
the vector field; they depend upon our particular choice of parallel transport.)
Of course, our choice of parallel transport should satisfy some axioms (it should be linear; it should be smooth (so that the vector fields along curves
that it gives rise to are smooth); and so on). If you look in the right text,
you will find these axioms written down.
As we've seen, a choice of parallel transport on our manifold gives a way
of differentiating vector fields along curves. But since a tangent vector
is just an infinitesimal curve, one sees that we in fact have a way of
differentiating a vector field in the direction of a given vector field: extend that vector field to a curve, and then differentiate along that curve.
Thus a choice of parallel transport determines an affine connection, i.e. a way
of differentiating vector fields along tangent vectors. Conversely, this is enough information to determine the parallel transport: we parallel transport a tangent vector along a curve by making sure that its derivative (using the given affine connection) at every point of the curve, in the tangent direction of the curve, vanishes.
So we have gone full circle, from parallel transport, to affine connection, back to parallel transport. The text you are reading is spelling out the second half of this circle, but is perhaps a little scanty on details regarding the first half; in fact, the exercise you are trying to solve is exactly about the first half of the circle, and its goal is to verify that you truly are going around in a circle: i.e. that you are ending up where you started from.
There really is something to verify here (i.e. the exercise is not trivial if you have not done it before, and are learning these ideas for the first time), but I hope that the above discussion may help shed some light on its meaning,
and also make it seem less daunting (and perhaps more conceptual and less computational) than it might otherwise appear.
From my understanding, by defining a connection on your manifold, you
provide a way to identify vectors at one point of the manifold with
vectors at another point on the manifold via parallel transporting the
vector.
Parallel transport depends on 1. a Riemannian manifold $(M, g)$, for example the round unit sphere; 2. a pair of points $p$ and $q$ of $M$, not necessarily distinct; 3. a piecewise-smooth path $\gamma:[0, 1] \to M$ starting at $p$ and ending at $q$, i.e., satisfying $p = \gamma(0)$ and $q = \gamma(1)$. (It's not essential that the parameter interval be the unit interval $[0, 1]$; an arbitrary closed, bounded interval will do.)
A Riemannian metric induces a Levi-Civita connection. In your example, the round sphere has a connection already, for which a tangent vector to a great circle arc remains tangent to the arc under parallel transport along the arc.
The example of the round sphere demonstrates dependence of parallel transport on $\gamma$. If $\gamma$ were a constant path, or a great circle arc traced forward and backward, parallel transport along $\gamma$ would be the identity map. For the spherical triangle $\gamma$ in your diagram, parallel transport along $\gamma$ is not the identity.
To emphasize (what seems to be) the underlying issue: The path $\gamma$ is a crucial piece of data in parallel transport; there's no well-defined notion of "parallel transport from $p$ to $q$" except in very special circumstances, such as parallel transport in a Euclidean plane, or on a flat torus. (Flatness—identically-vanishing Gaussian/sectional curvature—is necessary but not sufficient.)
Best Answer
Let $(M,g)$ a Riemannian manifold with $\nabla$ any $g$-compatible connection, then $\frac{g(V(t),V(t))}{dt}=g(\nabla_{\dot c} V(t),V(t))+g(V(t),\nabla_{\dot c}V(t))=0$ so $g(V(t),V(t))$ is constant so $P$ preserves lengths.
Edit: see comments.
You have $0=\nabla_Vg$ which is the same as $0=\sum_{i,j} \nabla_V(g_i\otimes g_j)=\sum_{i,j}(\nabla_Vg_i\otimes g_j+g_i\otimes \nabla_Vg_j)$ where $g_i,j$ are 1-forms. Then we get: $$(\nabla_Vg)(X,Y)=\sum_{i.j}\nabla_Vg_i(X)g_j(Y)+g_i(X)\nabla_Vg_j(Y))$$ now by the rule of covariant derives in 1-forms we get $(\nabla_Vg_i)(X)=V(g_i(X))-g_i(\nabla_VX)$ by putting this in the previous formula we get: $$\sum_{i,j} V(g_i(X))g_j(Y)-g_i(\nabla_V(X))g_j(Y)+g_i(X)V(g_j(Y))-g_i(X)g_j(\nabla_V(Y))=V(g(X,Y)-(g(\nabla_VX,Y)+g(X,\nabla_VY))$$