I'll change your notation a little to make things clearer (in my opinion, at least). Let $\pi \colon E \rightarrow M$ be a smooth vector bundle. With it comes the associate short exact sequence
$$ 0 \rightarrow VE \hookrightarrow TE \xrightarrow{d\pi} \pi^{*}(TM) \rightarrow 0 $$
of vector bundles over $E$. For the purpose of defining the covariant derivative, it is better to consider a left splitting $K \colon TE \rightarrow VE$ (over $E$). Note that $VE \cong \pi^{*}(E)$ using the natural isomorphism which allows to identify $V_{(p,v)}E = T_{(p,v)}(E_p)$ (vectors which are tangent to the fiber $E_p$) with the vector space $E_p$. Denote this isomorphism by $\Phi$ and let $\pi_{\sharp} \colon \pi^{*}(E) \rightarrow E$ be the natural map of vector bundles that covers $\pi$. Then we can define the covariant derivative of a section $s \in \Gamma(E)$ by
$$ \nabla s = \pi_{\sharp} \circ \Phi \circ K \circ ds.$$
More explicitly, $s$ is a map from $M$ to $E$ and $ds \colon TM \rightarrow TE$ is the regular differential. To get the covariant derivative, we take the regular derivative $ds$, project it to the vertical space using $K$ and then identify the vertical space with $E$ to get back a section of $E$ over $M$. If the splitting $K$ satisfies the equivariance conditions appropriate for a connection on a vector bundle, this will reconstruct the usual covariant derivative.
Let us try and see concretely how the process above works when $E = M \times \mathbb{R}^k$ is the trivial bundle. Fix some coordinate neighborhood $U$ with coordinates $x^1,\dots,x^n$ and let $\xi^1,\dots,\xi^k$ denote the coordinates on $\mathbb{R}^k$. Then $\pi^{-1}(U)$ is a coordinate neighborhood with coordinates I'll denote by $\tilde{x}^1,\dots,\tilde{x}^n$ and $\tilde{\xi}^1,\dots,\tilde{\xi}^k$. We have $\tilde{x}^i = x^i \circ \pi_1$ and $\tilde{\xi}^i = \xi^i \circ \pi_2$ and I use the $\tilde \,$ to differentiate between the coordinates on the base / fiber and on the total space.
With this notation, the vertical space $V_{(p,v)}E$ at $(p,v)$ is precisely $$\operatorname{span} \left \{ \frac{\partial}{\partial \tilde{\xi}^1}|_{(p,v)}, \dots, \frac{\partial}{\partial \tilde{\xi}^k}|_{(p,v)} \right \}. $$
A projection $K$ from $TE$ onto $VE$ will look like:
$$ K|_{(p,v)} = a_i^j(p,v) d\tilde{x}^i \otimes \frac{\partial}{\partial \tilde{\xi}^j} + d\tilde{\xi}^i \otimes \frac{\partial}{\partial \tilde{\xi}^i}$$
(the image must be the vertical bundle and it must satisfy $K^2 = K$).
Now, let $s \colon M \rightarrow M \times \mathbb{R}^k$ be a section and write $s(p) = (p, f(p))$ for some $f = (f^1,\dots,f^k) \colon M \rightarrow \mathbb{R}^k$. Set
$$e_i(p) := (p, \underbrace{(0,\dots,0,1,0,\dots,0)}_{i\text{th place}}$$
to be the constant sections corresponding to the standard basis vectors so $s = f^i e_i$. Let us see how the covariant derivative of $s$ in the direction $\frac{\partial}{\partial x^l} = \partial_l$ (in the base) at the point $p$ looks like:
$$ ds|_{p} = dx^i \otimes \frac{\partial}{\partial \tilde{x}^i} + \frac{\partial f^i}{\partial x^j} dx^j \otimes \frac{\partial}{\partial \tilde{\xi}^i}, \\
K \circ ds
= \left( a_i^j(p,f(p)) + \frac{\partial f^j}{\partial x^i}(p) \right) dx^i \otimes \frac{\partial}{\partial \tilde{\xi}^j}, \\
\nabla_l(s)(p) = \left( a_l^j(p, f(p)) + \frac{\partial f^j}{\partial x^l}(p) \right) e_j(p). $$
Note that $\nabla_l(s)(p)$ has two components. The second is the regular directional derivative of the components of $s$ with respect to the frame $(e_1,\dots,e_k)$ in the direction $\partial_l$. The first comes from the the projection $K$. If $a_i^j \equiv 0$, this is gone. Also, the components $a_i^j$ depend both on the point $p$ and the value $f(p)$ (this reflects the fact that $K$ gives us a projection of $TE$ onto $\pi^{*}(E)$). For a general vector bundle, this is the local picture.
Regarding your questions, we're not ignoring the variation between fibers. This is encoded in the particular way $K$ projects onto $VE$ (through the coefficients $a_i^j$ which give rise under certain assumptions to the Christoffel symbols $\Gamma_{ik}^j$ of the connection). While the image of $K$ is always $VE$, the kernel of $KE$ is different at each point and provides us with the horizontal space. The horizontal space tells us how we should identify fibers infinitesimally along curves over the base space.
Covariant differentiation allows us to differentiate a section along a vector field on $M$ and get back a section. It is done by performing regular differentiation and obtaining a tangent vector in $E$ which is necessarily not tangent to the fiber. The connection mechanism, via $K$, provides us with a way to project this tangent vector in a consistent way to get a vector which is tangent to the fiber and then identify it with an element of the fiber.
Best Answer
First the calculation:
Let $z_1,...,z_n$ be a frame of $TY$ in some neighbourhood of a point and let $z_1',...,z_n'$ denote their horizontal lifts. Now any vector-field $v$ in that neighbhourhood admits a unique expansion $v= \sum_i \alpha_i\,z_i$, here $\alpha_i$ are functions whose value at a point depend only on the value of $v$ at that point. What happens is that when you lift $v$ horizontally is that you just get $\sum_i \alpha_i'\, z_i'$, where $\alpha_i'$ are now functions $X\to\Bbb R$ given by $\alpha_i'(x) = \alpha_i (f(x))$. Now if you take the commutator of two horizontal lifts $v'=\sum_i\alpha_i' z_i'$ and $u'=\sum_i\beta_i' z_i'$ you get: $$\sum_{ij}[\alpha_i' z_i', \beta_j' z_j' ]=\sum_{ij} \alpha_i'\beta_j' [z_i',z_j'] + \sum_{ij}(\beta_j' z_j'(\alpha_i')\,z_j' - \alpha_i' z_i'(\beta_j')\,z_j)$$ Now the second summand is horizontal and as such the vertical component is only contained in the first summand. But at an arbitrary point $x$ that summand only depends on the values of $\alpha_i(f(x)), \beta_j(f(x))$, which are determined by $u_{f(x)}$ and $v_{f(x)}$ and the value of $[z_i',z_j']_x$, which is independent of $u,v$. In other words it does not depend on how the fields $u, v$ look like in a neighbourhood of the point.
As to the intuition:
The way a horizontal field varies in fibre direction is uniquely determined by the value of the field at the base-point. Since the commutator describes the way two vector fields mutually change in a direction, the vertical component of the commutator describes the mutual vertical variation of the two fields. But the vertical behaviour is completely determined by the value at the base-point.