1) Taking differential derivatives allows you to do differential calculus on manifolds. One explicit example could be defining tangent fields, i.e. maps $X:M\to TM:=\sqcup_{p\in M}T_pM$ such that $\pi\circ X=\mathrm{id}_M$ where $\pi:TM\to M$ is the canonical projection, and integrating them in order to get flow maps, i.e. maps $\varphi:\mathbb{R}\times M\to M$ such that $\varphi(0,\cdot)=\mathrm{id}_M$ and $\left.\frac{\partial\varphi(\cdot,x)}{\partial t}\right|_t=X_{\varphi(t,x)}$. Thus, from linear data ($X$), you recover a family of diffeomorphisms of $M$ with a certain behaviour.
2) If your manifold $S$ is a submanifold of an ambient one $M$, the inclusion $i:S\to M$ induces a map $di_p:T_pS\to T_pM$ which allows you to consider the tangent space of $S$ at $p$ as a linear subspace of the tangent space of $M$ at $p$. There is an other identification for tangent vectors of affine manifolds (that is $M=\mathbb{R}^n$ with the maximal atlas induced by $\mathcal{A}=\{(\mathrm{id}_{\mathbb{R}^n},\mathbb{R}^n)\}$) in order to identify them with actual vectors of $\mathbb{R}^n$: this identification is given by $\mathbb{R}^n\ni v\mapsto\partial_v\in T_p\mathbb{R}^n$, where $\partial_v$ acts on functions $f\in C^\infty_p(\mathbb{R}^n)$ by
$$\partial_vf=\lim\limits_{t\to 0}\frac{f(p+tv)-f(p)}{t}.$$
In other words, you identify the vector $v$ with the directional derivative in the direction $v$. So when you have a submanifold $S$ of an affine one, you can:
Identify a tangent vector of $S$ as a tangent vector of $\mathbb{R}^n$
Identify the tangent vector of $\mathbb{R}^n$ with an actual vector of $\mathbb{R}^n$.
3) Again, taking directional derivatives on a manifold is authorizing himself to do differential calculus on manifolds, allowing the use of useful theorems as implicit function theorem or inverse function theorem. For the identification of the two definitions, I will answer it in 4).
4) You answer your question by pointing the identification $[\gamma]\mapsto D_\gamma$, but you have to be carful that this does not depend of the choice of the representant $\gamma$. But since
$$(f\circ\gamma)'(0)=(f\circ\varphi^{-1}\circ\varphi\circ\gamma)'(0)=d(f\circ\varphi^{-1})_{\varphi\circ\gamma(0)}\left((\varphi\circ\gamma)'(0)\right)$$
by the chain rule, it is clear by the definition of the equivalence relation that is will be the case.
Your derivation is correct.
An informal way of thinking about it: you are solving for $A$ such that $(P + \epsilon X)^2 = (P+\epsilon X)$, where you can treat $\epsilon^2$ as zero. This gives the same equation for the tangent space at $P$.
As noted in the comments, the manifold actually has four components of different dimensions (two of which are just points). Your appeal to the regular value theorem is correct and proves that each component is a manifold (there are two zero-dimensional components and two four-dimensional components).
Best Answer
Yes, it's because of the definition of regular value (or regular level set). Let $\phi$ be a local parametrization of $f^{-1}(0)$ in a neighborhood of $p$ with $\phi(0)=p$. Then $T_pf^{-1}(0) = \text{im}(d\phi_0)$ is a $2$-dimensional subspace of $\Bbb R^3$.
Now we have $h\circ\phi = 0$ (since $h\big|_{f^{-1}(0)} = 0$). By the chain rule, $$dh_p\circ d\phi_0 = 0,$$ which means that $dh_p\big|_{T_pf^{-1}(0)} = 0$, as you asked.