- Let me assume the bundle is trivial. Then $A$ is a $\mathfrak{g}$-valued 1-form on the base $M$. Let $\gamma(t):[0,1]\rightarrow M$ be a parametrization of $C$. A horizontal lift $\tilde{\gamma}:[0,1]\rightarrow P\cong G\times M$ is just a pair $\tilde{\gamma}(t)=(\gamma(t),g(t))$, where $g(t)\in G$. The parallel transport equation then says $g^*\theta = \gamma^* A$, where $\theta$ is the Maurer-Cartan form on $G$. Explicitly, for matrix groups the equation reads $$g(t)^{-1}dg(t) = A(\gamma(t)).$$ The solution to this equation is by definition the path-ordered exponential
$$g(t)=\mathcal{P}\exp\left(\int_0^1 \gamma^*A\right)g(0)=\mathcal{P}\exp\left(\oint_C A\right)g(0).$$
This works for general groups as well if you use the exponential map.
For an abelian group $G$ one has $F=dA$, and so you can use Stokes theorem to write the exponent as $\int_D F$. Note, that any connected compact abelian group is isomorphic to $U(1)^n$. The only difference for non-abelian groups $G$ is that you cannot reduce the integral of the connection to an integral of the curvature.
To evaluate a Wilson loop, you need $R$ to be a representation of $\mathfrak{g}$, it does not have to exponentiate to $G$. It works as follows: you first use your representation to make $A$ a matrix-valued 1-form and then use the ordinary matrix path-ordered exponential. If the representation exponentiates, it coincides with the usual definition, where you first use the exponential map and then take the trace.
I understand that Witten's remark as follows. Since your action is $\frac{1}{e^2}\int F\wedge *F$, to use perturbation series you rescale $A\rightarrow e A$. That means you have to expand the exponential for the Wilson loop in Taylor series. If your connection is not coupled to anything, all your diagrams are photons emitted and absorbed by the Wilson loop.
For example, at order $e^2$ you would have a process like $\langle Tr(AA)\rangle$, which you can think of as a Wilson loop together with a photon propagator attached, where the photon carries the representation indices away. So, half of the Wilson loop "carries" a representation (it corresponds to the trace), while the other half has the zero representation (since you are multiplying $A$'s). I hope it's clear without a picture.
This is the definition of the gauge field. Suppose you have an SU(2) symmetry, for definiteness, consider isospin. So the notion of "proton" and "neutron" define two axes in isospin space, and you might want to say that it is arbitrary which two linear combinations of proton and neutron are the right basis vectors. So that someone defines one basis of "proton" and "neutron" at one point, and someone else defines a different basis at the same point, and you can't tell which one is right (pretend there is no charge on the proton, and the masses are exactly equal).
So you have the freedom to redefine the proton and neutron by a different SU(2) rotation at every point. This is the gauge freedom, you can multiply by a different group element G(x). Now to compare a proton at a point x with a proton at a point y, you have to transport the proton along a curve from x to y.
The gauge connection tells you what matrix you multiply by when you move in an infinitesimal direction $\delta x_\alpha$. The SU(2) matrix you rotate by is
$$ M^i_j = I + i A_{\alpha j}^i \delta x^\alpha$$
This is infinitesimally close to the identity, so the A part is in the Lie algebra of SU(2). The "i" is conventional in physics, to make the A matrix hermitian as opposed to anti-hermitian, as is the cleaner convention and the one used in mathematics. This means that A is a linear combination of Pauli matrices. This gives you a concrete representation of the gauge field (suppressing the i,j indices):
$$ A_\alpha = A_{\alpha k}\sigma^k $$
You assumed that the parallel transport is linear in the $\delta x$'s, this is so that the notion is compatible with the notion of spacetime as a differential manifold--- if you double the displacement you double the infinitesimal rotation angle. You assume it's infinitesimal by physical continuity.
From this, it is obvious that the parallel transport along a curve is the product of A's along each of the infinitesimal segments that make up the curve:
$$ \prod (I+ A_k dx^k) = \mathrm{Pexp}(i \int A dx )$$
Where the path-ordered exponential is defined as the limit of the product on the left. This is the nonabelian generalization of the phase acquired by a charge particle in an electromagnetic field along a path.
The gauge field is then a map between curves and SU(N) matrices with the property that if you join paths end-to-end, the matrices multiply. The matrix associated to an infinitesimal closed loop is called the curvature, and it is proportional to the element of area enclosed in the loop. This is identical to general relativity. The whole exercize is a generalization of the connection of general relativity to cases where the groups are not rotations. Specializing to the rotation case gives GR.
Best Answer
The original papers by Gerard 't Hooft himself are quite readable.
Whenever I open these papers, I'm always awestruck.