As I understand it, these are your questions:
- How does one define the derivative of a vector field? Do we just take the "derivatives" of each vector in the field? If so, what does it mean to take the derivative of a differential operator, anyway?
- Why does the total derivative of a scalar field give information about rates of change, while the "total derivative" of a vector field gives the pushforward (which doesn't seem to relate to rates of change)?
I think the best way to answer these questions is to provide a broader context:
In calculus, we ask how to find derivatives of functions $F\colon \mathbb{R}^m \to \mathbb{R}^n$. The typical answer is the total derivative $DF\colon \mathbb{R}^m \to L(\mathbb{R}^m, \mathbb{R}^n)$, which assigns to each point $p \in \mathbb{R}^m$ a linear map $D_pF \in L(\mathbb{R}^m, \mathbb{R}^n)$. With respect to to the standard bases, this linear map can be represented as a matrix:
$$D_pF = \begin{pmatrix}
\left.\frac{\partial F^1}{\partial x^1}\right|_p & \cdots & \left.\frac{\partial F^1}{\partial x^m}\right|_p \\
\vdots & & \vdots \\
\left.\frac{\partial F^n}{\partial x^1}\right|_p & \cdots & \left.\frac{\partial F^n}{\partial x^m}\right|_p
\end{pmatrix}$$
Personally, I think this encodes the idea of "rate of change" very well. (Just look at all those partial derivatives!)
Let's now specialize to the case $m = n$. Psychologically, how does one intuit these functions $F\colon \mathbb{R}^n \to \mathbb{R}^n$? There are two usual answers:
(1) We intuit $F\colon \mathbb{R}^n \to \mathbb{R}^n$ as a map between two different spaces. Points from the domain space get sent to points in the codomain space.
(2) We intuit $F\colon \mathbb{R}^n \to \mathbb{R}^n$ as a vector field. Every point in $\mathbb{R}^n$ is assigned an arrow in $\mathbb{R}^n$.
This distinction is important. When we generalize from $\mathbb{R}^n$ to abstract manifolds, these two ideas will take on different forms. Consequently, this means that we will end up with different concepts of "derivative."
In case (1), the maps $F\colon \mathbb{R}^m \to \mathbb{R}^n$ generalize to smooth maps between manifolds $F \colon M \to N$. In this setting, the concept of "total derivative" generalizes nicely to "pushforward." That is, it makes sense to talk about the pushforward of a smooth map $F \colon M \to N$.
But you asked about vector fields, which brings us to case (2). In this case, we first have to be careful about what we mean by "vector" and "vector field."
A vector $v_p \in T_pM$ at a point $p$ is (as you say) a directional derivative operator at the point $p$. This means that $v_p$ inputs a scalar field $f\colon M \to \mathbb{R}$ and outputs a real number $v_p(f) \in \mathbb{R}$.
A vector field $v$ on $M$ is a map which associates to each point $p \in M$ a vector $v_p \in T_pM$. This means that a vector field defines a derivative operator at each point.
Therefore: a vector field $v$ can be regarded as an operator which inputs scalar fields $f\colon M \to \mathbb{R}$ and outputs scalar fields $v(f)\colon M \to \mathbb{R}$.
In this setting, it no longer makes sense to talk about the "total derivative" of a vector field. You've said it yourself: what would it even mean to talk about "derivatives" of vectors, anyway? This doesn't make sense, so we'll need to go a different route.
In differential geometry, there are two ways of talking about the derivative of a vector field with respect to another vector field:
- Connections (usually denoted $\nabla_wv$ or $D_wv$)
- Lie derivatives (usually denoted $\mathcal{L}_wv$ or $[w,v]$)
Intuitively, these notions capture the idea of "infinitesimal rate of change of a vector field $v$ in the direction of a vector field $w$."
Question: What do these constructions look like in $\mathbb{R}^n$?
Taking advantage of the fact that we're in $\mathbb{R}^n$, we can look at our vector fields in the calculus way: as functions $v\colon \mathbb{R}^n \to \mathbb{R}^n$. As such, we can write the components as $v = (v^1,\ldots, v^n)$.
The (Levi-Civita) connection of $v$ with respect to $w$ is defined as
$$\nabla_wv = (w(v^1), \ldots, w(v^n)),$$
where $$w(v^i) := w^1\frac{\partial v^i}{\partial x^1} + \ldots + w^n\frac{\partial v^i}{\partial x^n}.$$
The Lie derivative of $v$ with respect to $w$ has a technical definition in terms of flows that I don't want to go into, but the bottom line is that it's similar to Rod Carvalho's answer.
Also, in $\mathbb{R}^n$ we have the pleasant formula
$$\mathcal{L}_wv = \nabla_wv - \nabla_vw,$$
which aids in computation.
Best Answer
It depends on the level of abstraction you're working at. At the level of abstraction commonly used in, say, physics,
Technically one could imagine calling a function $f : X \to V$ where $V$ is a finite-dimensional vector space a "vector field" but this would be nonstandard. This corresponds to a section of the trivial bundle with fiber $V$ which will differ from the tangent bundle of $X$ in general.
As for your third question, yes, you're technically right, but the point of doing this is to think about a vector field as an object "living on" $X$ in some sense. This is a bit vague but it becomes a lot clearer when considering nontrivial tangent bundles.