As I understand it, the "covariant"
part of this comes from the fact that
the T∗M component changes covariantly
under coordinate changes and not how
the E component changes. Is this
correct?
Yes.
The motivation for the qualifier
"covariant" seems to ultimately stem
from coordinate-based definitions and
considerations. I've also seen uses of
"covariant" to mean "independent of
coordinate choice" in a more abstract
setting. Is this also valid?
Yes on this one as well.
Is there already terminology for this
"nonlinear covariant derivative". If
not, would "nonlinear covariant
derivative" or "nonlinear Koszul
connection" be appropriate for such?
There is a notion of nonlinear connection, even for the tangent bundle. Recall that the auto-parallel curves of the Levi-Civita connection of a metric are the geodesic. Recall also that these curves have a variational characterization, the local minimizers of length. The geodesic equations are the Euler-Lagrange equations of the length functional. The metric is used to define the lagrangian of this functional. More generally, given a lagrangian on a manifold, i.e., a function on the total space of the tangent bundle, the extremals of the resulting functional can be interpreted as the auto-parallel curves of a nonlinear connection on the tangent bundle. It is called nonlinear because the parallel transport defined by this connection is a nonlinear map. Finsler geometry is a special incarnation of this philosophy. The lagrangean in this case is a function whose restriction to each tangent space is a norm on that tangent space.
Here is another way of obtaining the Christoffel symbols with the symetry imposed by the torsion free condition
$$ \Gamma^i_{k\ell}=\Gamma^i_{\ell k}. $$
This goes back to Riemann's Habillitation.
Suppose that $(M,g)$ is a Riemann manifold of dimension $N$, $p\in M$. By fixing an orthonormal frame of $T_pM$ we can find local coordinates $(x^1,\dotsc, x^N)$ near $p$ such that, $\newcommand{\pa}{\partial} $
$$ x^i(p)=0, \;\; g=\sum_{i,j} g_{ij}(x) dx^i dx^j, $$
$$g_{ij}(x)= \delta_{ij} +\sum_{i,j}\left(\sum_k\pa_{x^k}g_{ij}(0) x^k\right) dx^i dx^j + O(|x|^2). $$
In other words, in these coordinates,
$$ g_{ij}(x)=\delta_{ij} +O(|x|). $$
Riemann was asking whether one can find new coordinates near $p$ such that in these coordinates the metric $g$ satisfies $g_{ij}=\delta_{ij}$.
As a first step, we can ask whether we can find a new system of coordinates such that, in these coordinates the metric $g$ is described by
$$ g=\sum_{ij}\hat{g}_{ij} dy^idy^j, $$
where
$$\hat{g}(y)=\delta_{ij}+ O(|y|^2). \tag{1} $$
The new coordinates $(y^j)$ are described in terms of the old coordinates $(x^i)$ by a family of Taylor approximations
$$y^j= x^j + \frac{1}{2}\sum_{ij}\gamma^j_{\ell k} x^\ell x^k + O(|x|^3),\;\; \gamma^j_{\ell k}=\gamma^j_{k\ell}. $$
The constraint (1) implies
$$ \gamma^j_{\ell k}=\frac{1}{2}\left(\pa_{x^\ell}g_{jk}+\pa_{x^\ell}g_{jk}-\pa_{x^j}g_{\ell k}\right)_{x=0}. $$
We see that, in the $x$ coordinates
$$ \Gamma^i_{k\ell}(p)=\gamma^i_{k\ell}, $$
because $g^{ij}(p)=\delta^{ij}$.
It took people several decades after Riemann's work to realize that the coefficients $\Gamma^i_{k\ell}$ are related to parallel transport, and ultimately, to a concept of connection.
Ultimately, to my mind, the best explanation for the torsion-free requirement comes from Cartan's moving frame technique. The clincher is the following technical fact: given a connection $\nabla$ on $TM$ and a $1$-form $\alpha\in \Omega^1(M)$ then for any vector fields $X,Y$ on $M$ we have
$$d\alpha(X,Y)= X\alpha(Y)-Y\alpha(X)-\alpha([X,Y]) $$
$$= (\nabla_X\alpha)(Y)-(\nabla_Y\alpha)(X)+\alpha(\nabla_XY-\nabla_YX)-\alpha([X,Y]) $$
$$= (\nabla_X\alpha)(Y)-(\nabla_Y\alpha)(X)+\alpha\bigl(\;T_\nabla(X,Y)\;\bigr). $$
If the torsion is zero, the above equality looses a term, and one obtains rather easily Cartan's structural equations of a Riemann manifold.
Best Answer
Your question is a bit vague, but let me try the following statement, which might be the kind of answer you are looking for: If $M$ is a manifold and $S\to M$ is a vector bundle over $M$ endowed with a connection $\nabla$, then a 'total differential equation' for sections $s$ of $S$ is an equation of the form $\nabla_X s = 0$ for all vector fields $X$. Then the Frobenius integrability conditions for this system are simply $\Omega^\nabla=0$, where $\Omega^\nabla$ is the curvature of $\nabla$, i.e., it is the $2$-form on $M$ with values in $\mathrm{End}(S)$ that is defined by $$ \Omega^\nabla(X,Y)s = \nabla_X\nabla_Ys - \nabla_Y\nabla_Xs - \nabla_{[X,Y]}s $$ for all vector fields $X$ and $Y$ on $M$. So, yes, the Frobenius integrability conditions are 'covariant', i.e., they can be formulated independently of coordinates.
Now, one may wonder, "But suppose I start with some linear total differential system $$ \frac{\partial u^a}{\partial x^i} = \Gamma^a_{bi}(x) u^b \tag{1} $$ in local coordinates. Does this apply?" The answer is 'yes': One regards the $u^a$ as the components of a section $s = (u^a)$ of a trivialized vector bundle $S$ and the matrices $\Gamma_i = \bigl(\Gamma^a_{bi}(x)\bigr)$ as sections of $\mathrm{End}(S)$ and one defines a connection $\nabla$ on this trivial bundle by the rule $$ \nabla_{\frac{\partial}{\partial x^i}}s = \frac{\partial s}{\partial x^i} - \Gamma_i\ s. $$ Then the curvature of $\nabla$ vanishes if and only if the Frobenius conditions are satisfied for the system (1).
If one has an inhomogeneous linear system $$ \frac{\partial u^a}{\partial x^i} = \Gamma^a_{bi}(x) u^b + \Lambda^a_i(x), \tag{2} $$ one can convert it to a homogeneous system by replacing $\Lambda^a_i(x)$ by $\Lambda^a_i(x)u^0$ in the above equation and adjoining the equations $$ \frac{\partial u^0}{\partial x^i} = 0, $$ making it a homogeneous linear system in one more variable. (Effectively, one allows the $a$ and $b$ indices (which run from, say, $1$ to $s$) to attain the value $0$, and sets $\Gamma^0_{bi}= \Gamma^0_{0i}=0$ and $\Gamma^a_{0i}=\Lambda^a_i$.) Again, the Frobenius conditions are equivalent to the vanishing of the curvature of the connection associated to the augmented system (now in $s{+}1$ unknowns rather than $s$ unknowns).
If one goes all the way to a fully nonlinear total differential system, say, $$ \frac{\partial u^a}{\partial x^i} = \Gamma^a_{i}(x,u), \tag{3} $$ then one can't (usually) reformulate this as the vanishing of the covariant derivatives of a section of something. However, there is still a 'covariant' coordinate-free interpretation of the Frobenius condition in terms of a tensor, but the tensor is defined on the total space $S$ (which has the $x$'s and $u$'s as local coordinates). The point is then, that, on $S$, one has a plane field $D$ (sometimes called a 'distribution') defined by the Pfaffian equations $$ du^a - \Gamma^a_{i}(x,u)\ dx^i = 0 $$ (summation on $i$ intended). Associated to this plane field $D\subset TS$, there is a natural skew-symmetric linear operator $$ \Phi: D\times D \to TS{\bigl/}D $$ i.e., a section of the bundle $\bigl(TS{\bigl/}D\bigr)\otimes \Lambda^2(D^\ast)$ over $S$ that vanishes if and only if the system (3) satisfies the Frobenius conditions. It is this operator that is the 'covariant' interpretation of the Frobenius integrability conditions. Of course, this operator is defined via the Lie bracket. On sections $X$ and $Y$ of $D$, it is just $$ \Phi(X,Y) \equiv [X,Y]\ \mathrm{mod}\ D, $$ so one gets the usual statement that the Frobenius integrability conditions are equivalent to the condition that the sections of $D$ be closed under Lie bracket.
Finally, to get back to when one can interpret the Frobenius conditions as the vanishing of the curvature of a connection, there is a rather general formulation of this that goes back all the way to Lie: Suppose that the system (3) can be written in the form $$ \frac{\partial u^a}{\partial x^i} = \Gamma^a_{i}(x,u) = \gamma_{i\sigma}(x) F^{a\sigma}(u), \tag{4} $$ for some functions $\gamma_{i\sigma}$ of $x$ and $F^{a\sigma}$ of $u$ (where summation over a new index $\sigma$ is assumed), and, further, assume that the vector fields $$ U^\sigma = F^{a\sigma}(u)\ \frac{\partial\ \ }{\partial u^a} $$ form a Lie algebra $L$, i.e., that there exist constants $c_\sigma^{\rho\tau}=-c_\sigma^{\tau\rho}$ such that $[U^\rho,U^\tau] = c_\sigma^{\rho\tau}U^\sigma$. Then the system (4) is what is known as a system of Lie type.
To avoid degenerate situations, one can assume, without loss of generality, that the $U^\sigma$ are a basis of $L$. In this case, one has that the Frobenius integrability condition for (4) is equivalent to the vanishing of the curvature of the $L$-valued connection $1$-form $$ \omega = U^\sigma\ \gamma_{i\sigma}(x)\ dx^i. $$ The use of this form of the equations belongs in the theory of symmetry analysis of differential equations, about which there is a very large literature.
Of course, recognizing when (3) can be written in the form (4) is not a trivial matter; there are methods for doing this involving Cartan's Method of Equivalence, but that would take a while to explain.