I'll change your notation a little to make things clearer (in my opinion, at least). Let $\pi \colon E \rightarrow M$ be a smooth vector bundle. With it comes the associate short exact sequence
$$ 0 \rightarrow VE \hookrightarrow TE \xrightarrow{d\pi} \pi^{*}(TM) \rightarrow 0 $$
of vector bundles over $E$. For the purpose of defining the covariant derivative, it is better to consider a left splitting $K \colon TE \rightarrow VE$ (over $E$). Note that $VE \cong \pi^{*}(E)$ using the natural isomorphism which allows to identify $V_{(p,v)}E = T_{(p,v)}(E_p)$ (vectors which are tangent to the fiber $E_p$) with the vector space $E_p$. Denote this isomorphism by $\Phi$ and let $\pi_{\sharp} \colon \pi^{*}(E) \rightarrow E$ be the natural map of vector bundles that covers $\pi$. Then we can define the covariant derivative of a section $s \in \Gamma(E)$ by
$$ \nabla s = \pi_{\sharp} \circ \Phi \circ K \circ ds.$$
More explicitly, $s$ is a map from $M$ to $E$ and $ds \colon TM \rightarrow TE$ is the regular differential. To get the covariant derivative, we take the regular derivative $ds$, project it to the vertical space using $K$ and then identify the vertical space with $E$ to get back a section of $E$ over $M$. If the splitting $K$ satisfies the equivariance conditions appropriate for a connection on a vector bundle, this will reconstruct the usual covariant derivative.
Let us try and see concretely how the process above works when $E = M \times \mathbb{R}^k$ is the trivial bundle. Fix some coordinate neighborhood $U$ with coordinates $x^1,\dots,x^n$ and let $\xi^1,\dots,\xi^k$ denote the coordinates on $\mathbb{R}^k$. Then $\pi^{-1}(U)$ is a coordinate neighborhood with coordinates I'll denote by $\tilde{x}^1,\dots,\tilde{x}^n$ and $\tilde{\xi}^1,\dots,\tilde{\xi}^k$. We have $\tilde{x}^i = x^i \circ \pi_1$ and $\tilde{\xi}^i = \xi^i \circ \pi_2$ and I use the $\tilde \,$ to differentiate between the coordinates on the base / fiber and on the total space.
With this notation, the vertical space $V_{(p,v)}E$ at $(p,v)$ is precisely $$\operatorname{span} \left \{ \frac{\partial}{\partial \tilde{\xi}^1}|_{(p,v)}, \dots, \frac{\partial}{\partial \tilde{\xi}^k}|_{(p,v)} \right \}. $$
A projection $K$ from $TE$ onto $VE$ will look like:
$$ K|_{(p,v)} = a_i^j(p,v) d\tilde{x}^i \otimes \frac{\partial}{\partial \tilde{\xi}^j} + d\tilde{\xi}^i \otimes \frac{\partial}{\partial \tilde{\xi}^i}$$
(the image must be the vertical bundle and it must satisfy $K^2 = K$).
Now, let $s \colon M \rightarrow M \times \mathbb{R}^k$ be a section and write $s(p) = (p, f(p))$ for some $f = (f^1,\dots,f^k) \colon M \rightarrow \mathbb{R}^k$. Set
$$e_i(p) := (p, \underbrace{(0,\dots,0,1,0,\dots,0)}_{i\text{th place}}$$
to be the constant sections corresponding to the standard basis vectors so $s = f^i e_i$. Let us see how the covariant derivative of $s$ in the direction $\frac{\partial}{\partial x^l} = \partial_l$ (in the base) at the point $p$ looks like:
$$ ds|_{p} = dx^i \otimes \frac{\partial}{\partial \tilde{x}^i} + \frac{\partial f^i}{\partial x^j} dx^j \otimes \frac{\partial}{\partial \tilde{\xi}^i}, \\
K \circ ds
= \left( a_i^j(p,f(p)) + \frac{\partial f^j}{\partial x^i}(p) \right) dx^i \otimes \frac{\partial}{\partial \tilde{\xi}^j}, \\
\nabla_l(s)(p) = \left( a_l^j(p, f(p)) + \frac{\partial f^j}{\partial x^l}(p) \right) e_j(p). $$
Note that $\nabla_l(s)(p)$ has two components. The second is the regular directional derivative of the components of $s$ with respect to the frame $(e_1,\dots,e_k)$ in the direction $\partial_l$. The first comes from the the projection $K$. If $a_i^j \equiv 0$, this is gone. Also, the components $a_i^j$ depend both on the point $p$ and the value $f(p)$ (this reflects the fact that $K$ gives us a projection of $TE$ onto $\pi^{*}(E)$). For a general vector bundle, this is the local picture.
Regarding your questions, we're not ignoring the variation between fibers. This is encoded in the particular way $K$ projects onto $VE$ (through the coefficients $a_i^j$ which give rise under certain assumptions to the Christoffel symbols $\Gamma_{ik}^j$ of the connection). While the image of $K$ is always $VE$, the kernel of $KE$ is different at each point and provides us with the horizontal space. The horizontal space tells us how we should identify fibers infinitesimally along curves over the base space.
Covariant differentiation allows us to differentiate a section along a vector field on $M$ and get back a section. It is done by performing regular differentiation and obtaining a tangent vector in $E$ which is necessarily not tangent to the fiber. The connection mechanism, via $K$, provides us with a way to project this tangent vector in a consistent way to get a vector which is tangent to the fiber and then identify it with an element of the fiber.
Let us take your description as our primitive idea, namely that a connection should be a method by which “a motion at the bottom induces a corresponding motion on top”. This is what you’ve roughly described by connecting the different fibers in a fiber bundle. Ok, this was already at the level of curves, but usually this is difficult to ‘prescribe’ in practice, so let us look at the ‘infinitesimal level’ first, i.e at the level of tangent spaces. So, our new mantra is ‘a tangent vector at the bottom must induce a corresponding tangent vector on top’. See the picture I drew here (sure I talk about vector bundles there, but that’s not essential).
Slightly more precisely, given a fiber bundle $(X,\pi, M)$, what we want is a map $L:TM\times_MX\to TX$ (the $\times_M$ means consider the fiber bundle over $M$ whose fiber over a point $x\in M$ is $T_xM\times X_x$) which assigns to each tangent vector $k_x\in T_xM$ and each fiber element $\xi_x\in X_x$, a tangent vector $L(k_x,\xi_x)\in T_{\xi_x}X$ such that $T\pi$ projects $L(k_x,\xi_x)$ back down to $k_x$. It’s kind of like if you imagine a tall bulding, and the person $x$ on the ground floor moves a little bit in the direction $k_x$. He then tells each of his upstairs neighbours $\xi_x$ to ‘follow his lead’ and move accordingly by an amount $L(k_x,\xi_x)$. The reason I used the notation $L$ is because this is a ‘lifting map’. It lifts the tangent vector $k_x$ on the base to a tangent vector $L(k_x,\xi_x)$ at height $\xi_x$. The fact that the projection of $L(k_x,\xi_x)$ under $T\pi$ is $k_x$ says that (if $k_x$ is non-zero) the vector $L(k_x,\xi_x)$ is not in the kernel of $T\pi$, i.e it is not tangent to the fiber $X_{x}$, or said in another way, it does not belong to the vertical space $V_{\xi_x}X$.
Now, let us also suppose that $L$ depends linearly in the $k_x$ slot. Then, for each $\xi_x\in X_x$, $L(\cdot,\xi_x)$ maps $T_xM$ linearly and bijectively onto a subspace of $T_{\xi_x}X$, which I shall call $H_{\xi_x}X$. We then have a direct sum decomposition $T_{\xi_x}X=V_{\xi_x}X\oplus H_{\xi_x}X$ (an exercise in linear algebra; what’s the ‘correct’ generalization?). So you can think of $L(\cdot,\xi_x)$ as ‘lifting’ the tangent space $T_xM$ to the height $\xi_x$ (I visualize this process as carrying a plate up a hill).
So you see, by starting with the naive idea of ‘moving in the bottom must induce a corresponding movement on top’, we obtain a direct sum decomposition $TX=VX\oplus HX$ of the tangent bundle $TX$ of the fiber bundle.
Alternatively, you can forget about a choice of lifting map $L$, and directly start with a choice of a decomposition $TX=VX\oplus HX$. Now, why should a choice of complementary subbundle $HX$ intuitively convey information about ‘connections’? Well its pretty obvious. Let $\zeta\in HX$ be non-zero, and for notational concreteness say $\zeta\in H_{\xi_x}X$ for some $\xi_x\in X$ and $x\in M$. Then by definition of being a tangent vector to $X$, it means I can find a smooth curve $\gamma(t)$ which has $\gamma(0)=\xi_x$ and $\dot{\gamma}(0)=\zeta$. The curve $\gamma(t)$ cannot entirely lie in the fiber $X_x$ because then its tangent vector would lie in the vertical space $V_{\xi_x}$. Ok, so this means for small $t$, $\gamma(t)$ will end up in a fiber different from $X_x$ where we started initially. So you see a choice of complementary subbundle (called horizontal subbundle) gives us a way to go from a given fiber to a very closely different fiber. This is all the intuition behind Ehresmann connections. But really, the tell-tale sign that this is a good idea is to look at the picture I linked: ‘clearly’ the green arrow is pointing towards a different fiber :)
To make some of the things I said above precise (namely going from a complementary subbundle $HX$ to getting an actual isomorphism of different fibers, you’d write down an ODE and invoke existence and uniqueness).
Previously, we’ve seen how specifying the ‘lifting map $L$’ gives us a horizontal subbundle. Let us now see the converse. Given a decomposition $TX=VX\oplus HX$, the rank nullity theorem implies that for each $x\in M$ and $\xi_x\in X_x$, the tangent map $T\pi_{\xi_x}:T_{\xi_x}X\to T_xM$ restricts to a linear isomorphism $H_{\xi_x}X\to T_xM$ (since we’re ignoring the kernel which is $V_{\xi_x}X$). The inverse of this isomorphism corresponds exactly to the lifting map $L(\cdot, \xi_x)$ above.
Ok so far I talked about general fiber bundles, but for principal bundles, we would like for our horizontal subspaces to ‘vary consistently’ as our group acts. Recall that the group orbits are the fibers, so if a group element $g$ moves a point $\xi_x$ in the fiber to the point $\xi_xg$, then we’d like for the induced map on tangent spaces to map the horizontal space $H_{\xi_x}X$ to be mapped to $H_{\xi_xg}X$. This is just the obvious thing to require.
Let $(X,\pi,M, G)$ be a principal bundle. Hopefully you’re now happy with the idea that specifying a horizontal subbundle (i.e a complement to $VX$) does indeed correspond to ‘infinitesimally connecting fibers’ (once you draw a picture this becomes almost tautological in hindsight). Of course for a principal connection the horizontal subbundle must vary accordingly with $G$.
First of all I said that directly talking about curves and their lifts is difficult, so we instead formulated things at the tangent space level. But now, specifying a collection of subspaces, and working with them is a little unwieldy, so we look for an alternative way to describe things. This will lead us to the connection 1-form, but first a general observation.
Fact.
Let $(X,\pi,M,G)$ be a principal bundle, $m:X\times G\to X$ the group action map, and let $VX$ be the vertical subbundle. Then, the mapping $\Phi:X\times\mathfrak{g}\to VX$ given by $\Phi(\xi,\gamma):=T\left(m(\xi,\cdot)\right)_e[\gamma]$ is smooth, fiberwise linear and makes the following diagram commute:
$\require{AMScd}$
\begin{CD}
X\times\mathfrak{g} @>{\Phi}>> VX \\
@V{\text{pr}_1}VV @VV{\pi|_{VX\to X}}V \\
X @>>{\text{id}_X}> X
\end{CD}
In other words, $\Phi$ provides a vector bundle isomorphism from the trivial vector bundle $X\times\mathfrak{g}$ over $X$, onto the vector bundle $VX$ over $X$.
In essence, this is saying that if a group element can act on $X$, then by taking derivatives, the Lie algebra elements can also act on $X$.
From here, it’s basically following your nose. Suppose we have a principal connection in the form of a direct sum decomposition $TX=VX\oplus HX$. Let $P_V:TX\to VX$ be the induced projection (warning: even though the notation may indicate otherwise, the definition of $P_V$ depends on $HX$!). Now, we can consider the map $\omega$ defined as the following triple composition:
$\require{AMScd}$
\begin{CD}
TX @>{P_V} >> VX @>{\Phi^{-1}} >> X\times\mathfrak{g} @>{\text{pr}_2}>> \mathfrak{g}.
\end{CD}
This is nothing but a Lie-algebra-valued $1$-form on $X$ (why? because for each $\xi\in X$, $\omega$ restricts to a linear map $T_{\xi}X\to \mathfrak{g}$). You can check this has all the properties of the connection 1-form. To recover the subspaces, you simply take the kernel of $\omega$. This completes the link between the two ideas.
Of course I’ve glossed over several details, but that’s what textbooks are for. The only way to appreciate these concepts is to draw your own pictures and make sense of what it is you’re drawing (I personally don’t understand something unless I’ve drawn it myself… even if in hindsight I end up drawing what someone else has already drawn).
Edit: A Summary.
So far we have seen three different ways of describing the same thing. This comes down to the ‘trichotomy’ of descriptions (I’m omitting some details regarding how the group compatibility comes into play):
- directly: This is the Ehresmann definition of just telling you outright which complementary subbundle $HX$ to choose.
- ‘parametrically’: this means you look at the image of some other map. In my answer, I described this using the lifting map $L$; its image is what we define $HX$ to be.
- ‘implicitly’: you take a level set of some map. The connection 1-form approach falls under this category because we define $HX$ to be the kernel of $\omega$. Note that another way of doing things is that we could specify a map $P:TX\to TX$ such that $P\circ P=P$ and its image equals $VX$. Then we can define $HX$ to be the kernel $P$ (with this approach this $P$ will equal what I called $P_V$ above. Then, $I-P$ will be the projection onto $HX$; review in the linear algebraic case if necessary).
So, there are many different ways one can formally describe something, but if you think for a moment, they’re all saying the same thing, just in a different guise. The motivation for all this comes, as always, from the really basic linear algebra case, so if in doubt, one should review the various descriptions there.
Best Answer
You can find this and more covered in Tu's book Differential Geometry: Curvature, Connections, and Characteristic Classes. Here's the link: https://www.springer.com/gp/book/9783319550824
You'll find this in the later sections of the book, but I would say it's a fairly leisurely read.