I think the simplest motivation is that of vector fields. We want to be able to assign a tangent vector to each point in the manifold, giving us a "field" of vectors on the manifold. That is, $F$ should be some map such that
$$ F(p)\in T_pM $$
for $p\in M$. So, what's wrong with just saying that? Well, nothing really, if you're only interested in the value of vector fields at a point. If you ever want to look beyond singular points, you need some structure connecting your different tangent spaces. For example, to look at continuity of $F$, we need the space of outputs of $F$ to have a topological structure; to look at differentiability, we need a differentiable structure.
So, we need some space which
- contains all the tangent vectors of $M$
- has the same "level" of structure that $M$ has
To satisfy (1), we simply glue together all the tangent spaces by taking a union. Since the tangent spaces are completely disconnected from one another, we can exemplify this fact by using a disjoint union
$$ TM = \coprod\limits_{p\in M} T_pM $$
The rest of the bundle structure is just there to "lift" the structure of $M$ onto $TM$. With that, we can now define vector fields as functions in the usual way
$$ F: M\to TM $$
and we are able to discuss continuity and differentiability to the extent that $M$ admits such properties. However, this definition isn't "complete" because it allows for e.g. attaching a tangent vector from $T_qM$ to $p$, which doesn't fit our idea of a vector field. This gives us another requirement,
- we need a way to determine which point a vector is tangent to
This is the bundle projection map, $\pi: TM\to M$, and so we can add the requirement on $F$ that $\pi(F(p)) = p$ everywhere.
Here, we created a bundle with a base manifold and a vector space at each point, but we could imagine a more general concept of bundle which just has some kind of space $B$ with some other kind of space $F_p B$ at each point $p\in B$. Even in this general setting, we can see the utility of attaching to each point of the base space some element of its attached space. We define a function
$$ \sigma: B\to FB = E $$
with $\pi(\sigma(p)) = p$ as a cross-section (or just section) of the total space $E$.
Applying this terminology to our original example, we can then reconstruct the more terse definition:
$$ \text{a vector field on } M \text{ is a section }\sigma\text{ of the tangent bundle } TM$$
1) Taking differential derivatives allows you to do differential calculus on manifolds. One explicit example could be defining tangent fields, i.e. maps $X:M\to TM:=\sqcup_{p\in M}T_pM$ such that $\pi\circ X=\mathrm{id}_M$ where $\pi:TM\to M$ is the canonical projection, and integrating them in order to get flow maps, i.e. maps $\varphi:\mathbb{R}\times M\to M$ such that $\varphi(0,\cdot)=\mathrm{id}_M$ and $\left.\frac{\partial\varphi(\cdot,x)}{\partial t}\right|_t=X_{\varphi(t,x)}$. Thus, from linear data ($X$), you recover a family of diffeomorphisms of $M$ with a certain behaviour.
2) If your manifold $S$ is a submanifold of an ambient one $M$, the inclusion $i:S\to M$ induces a map $di_p:T_pS\to T_pM$ which allows you to consider the tangent space of $S$ at $p$ as a linear subspace of the tangent space of $M$ at $p$. There is an other identification for tangent vectors of affine manifolds (that is $M=\mathbb{R}^n$ with the maximal atlas induced by $\mathcal{A}=\{(\mathrm{id}_{\mathbb{R}^n},\mathbb{R}^n)\}$) in order to identify them with actual vectors of $\mathbb{R}^n$: this identification is given by $\mathbb{R}^n\ni v\mapsto\partial_v\in T_p\mathbb{R}^n$, where $\partial_v$ acts on functions $f\in C^\infty_p(\mathbb{R}^n)$ by
$$\partial_vf=\lim\limits_{t\to 0}\frac{f(p+tv)-f(p)}{t}.$$
In other words, you identify the vector $v$ with the directional derivative in the direction $v$. So when you have a submanifold $S$ of an affine one, you can:
Identify a tangent vector of $S$ as a tangent vector of $\mathbb{R}^n$
Identify the tangent vector of $\mathbb{R}^n$ with an actual vector of $\mathbb{R}^n$.
3) Again, taking directional derivatives on a manifold is authorizing himself to do differential calculus on manifolds, allowing the use of useful theorems as implicit function theorem or inverse function theorem. For the identification of the two definitions, I will answer it in 4).
4) You answer your question by pointing the identification $[\gamma]\mapsto D_\gamma$, but you have to be carful that this does not depend of the choice of the representant $\gamma$. But since
$$(f\circ\gamma)'(0)=(f\circ\varphi^{-1}\circ\varphi\circ\gamma)'(0)=d(f\circ\varphi^{-1})_{\varphi\circ\gamma(0)}\left((\varphi\circ\gamma)'(0)\right)$$
by the chain rule, it is clear by the definition of the equivalence relation that is will be the case.
Best Answer
To keep things simple, let's stick to $3$-dimensions, so it is like familiar "ordinary vector calculus".
If $\sigma(t) = (x(t),y(t),z(t))$ is a curve, then its tangent vector is $\sigma'(t) = (x'(t),y'(t),z'(t))$. Let $p=\sigma(0)$, and $v = \sigma'(0)$.
By the chain rule from "ordinary vector calculus", the right-hand side of the equation you give above is:
$$ \frac{d}{dt} f(\sigma(t)) = \frac{d}{dt}f(x(t),y(t),z(t)) = \frac{\partial f}{\partial x} \frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt} + \frac{\partial f}{\partial z}\frac{dz}{dt} $$ and then of course evaluate at $t=0$. The left-most term in my equation above, $\left. \frac{d}{dt}f(\sigma(t)) \right|_{t=0}$, is by assumption the vector $v$ acting on the function $f$. On the other hand, the right-most part of my equation above looks like a dot product:
$$ \frac{\partial f}{\partial x} \frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt} + \frac{\partial f}{\partial z}\frac{dz}{dt} = \nabla f \cdot \sigma' $$
This is also the "ordinary vector calculus" definition of the directional derivative in the direction $v$:
$$ D_v f(p) = \nabla f(p) \cdot v $$