1) Taking differential derivatives allows you to do differential calculus on manifolds. One explicit example could be defining tangent fields, i.e. maps $X:M\to TM:=\sqcup_{p\in M}T_pM$ such that $\pi\circ X=\mathrm{id}_M$ where $\pi:TM\to M$ is the canonical projection, and integrating them in order to get flow maps, i.e. maps $\varphi:\mathbb{R}\times M\to M$ such that $\varphi(0,\cdot)=\mathrm{id}_M$ and $\left.\frac{\partial\varphi(\cdot,x)}{\partial t}\right|_t=X_{\varphi(t,x)}$. Thus, from linear data ($X$), you recover a family of diffeomorphisms of $M$ with a certain behaviour.
2) If your manifold $S$ is a submanifold of an ambient one $M$, the inclusion $i:S\to M$ induces a map $di_p:T_pS\to T_pM$ which allows you to consider the tangent space of $S$ at $p$ as a linear subspace of the tangent space of $M$ at $p$. There is an other identification for tangent vectors of affine manifolds (that is $M=\mathbb{R}^n$ with the maximal atlas induced by $\mathcal{A}=\{(\mathrm{id}_{\mathbb{R}^n},\mathbb{R}^n)\}$) in order to identify them with actual vectors of $\mathbb{R}^n$: this identification is given by $\mathbb{R}^n\ni v\mapsto\partial_v\in T_p\mathbb{R}^n$, where $\partial_v$ acts on functions $f\in C^\infty_p(\mathbb{R}^n)$ by
$$\partial_vf=\lim\limits_{t\to 0}\frac{f(p+tv)-f(p)}{t}.$$
In other words, you identify the vector $v$ with the directional derivative in the direction $v$. So when you have a submanifold $S$ of an affine one, you can:
Identify a tangent vector of $S$ as a tangent vector of $\mathbb{R}^n$
Identify the tangent vector of $\mathbb{R}^n$ with an actual vector of $\mathbb{R}^n$.
3) Again, taking directional derivatives on a manifold is authorizing himself to do differential calculus on manifolds, allowing the use of useful theorems as implicit function theorem or inverse function theorem. For the identification of the two definitions, I will answer it in 4).
4) You answer your question by pointing the identification $[\gamma]\mapsto D_\gamma$, but you have to be carful that this does not depend of the choice of the representant $\gamma$. But since
$$(f\circ\gamma)'(0)=(f\circ\varphi^{-1}\circ\varphi\circ\gamma)'(0)=d(f\circ\varphi^{-1})_{\varphi\circ\gamma(0)}\left((\varphi\circ\gamma)'(0)\right)$$
by the chain rule, it is clear by the definition of the equivalence relation that is will be the case.
Short answer: The elements of $C^\infty(M)$ are smooth functions from $M$ to $\mathbb R$; there is no equivalence relation involved.
I think you're misreading the Wikipedia article. It doesn't say "$C^\infty(M)$ is the space of germs of smooth functions defined on the entirety of $M$." What it actually says (with some irrelevant intervening text deleted) is
the subset $C^{\infty }(X,Y)$ ... of smooth functions ... can be defined, and then spaces of germs of ... smooth ... functions can be constructed.
In the special case that $X$ is a smooth manifold and $Y=\mathbb R$, what this means is that for each open subset $U\subseteq X$, we define $C^\infty(U)$ to mean the set of smooth functions from $U$ to $\mathbb R$, and then for each $x\in M$, we use the equivalence relation described earlier in the article to construct the space $C^\infty_x$ of germs of smooth functions at $x$.
Best Answer
You could define it either way. A priori, you might worry that these are different for two reasons:
As you already understand, the question linked above shows that there is nothing to worry about in (1); $Df = Dg$ as soon as $f$ and $g$ agree in a neighborhood of $p$.
As for (2), the point is that there will always be something in the equivalence class of your germ which extends to all of $M$, although you may have to shrink the neighborhood to get it. (For instance, in the given example, restrict the tangent function to $(-\pi/2 + \epsilon, \pi/2 - \epsilon)$ and then extend it to $\mathbb{R}$ by smoothly extending the graph however you like -- you know you can do this in generality thanks to the existence of bump functions and hat functions.)
So the definitions are equivalent. I guess it's reasonable to argue: "why did you make me think about this abstract thing like a germ, if you didn't need it for the definition?" The point is that derivations act a bit more naturally on germs than on global functions, and in particular, this generalizes better to other types of geometric structures.