The strategy of the proof is to convert the statement about smooth maps between manifolds into a statement about linear maps between vector spaces. At this early point in the book, the authors have not yet presented a way to define $d(\phi^{-1})$, because $\phi^{-1}: V \rightarrow U$ has a domain $V$ that is an open subset of a manifold, whereas they have only defined differentials in terms of difference quotients, which require that the domain be a vector space, so that addition is defined; see the limit definition given on page 8 of your text.
Therefore, it is necessary to "straighten" $V$ by first extending $\phi^{-1}$ to a map between Euclidean spaces, and then defining the differential of the extension using difference quotients.
To address your second question, this proof shows precisely that $d\phi_0$ is a linear isomorphism between $\mathbb{R}^k$ and the tangent space $T_x(X)$. A linear map is infinitely differentiable, so it is also a diffeomorphism.
The idea is the following: If $U$ is an open subset of $\mathbb{R}^k$, then we know that $T_0U\cong \mathbb{R}^k$ as vector spaces. If we use the map $\phi:U\to X$ as our parametrization, then we can see that $d\phi_0$ acts on $T_0U$ as a linear isomorphism. In order to get the best linear approximation to $\phi$, we should use the first order Taylor expansion $\phi(u)\approx\phi(0)+d\phi_0(u).$The best linear approximation to $X$ as a submanifold of $\mathbb{R}^n$ is given by the image of the tangent plane $T_0U\cong \mathbb{R}^k$. This is exactly the set of points $\phi(0)+d\phi_0(u)=x+d\phi_0(u)$ for all $u\in T_0U$.
So, we apply a linear transformation $d\phi_0:T_0U\to T_xX\subseteq \mathbb{R}^n.$ Then we add $\phi(0)+x$ to shift this linear space $T_xX$ to be tangent to $X$ at $x$. As a very concrete example, take the manifold $S^1\subseteq \mathbb{R}^2$. Near $(0,1)$ it has graph coordinates $(x,\sqrt{1-x^2})$. That is, we can parametrize a neighborhood of $(0,1)\in S^1$ by $(-1,1)\to S^1$ given by $\phi:t\mapsto (t,\sqrt{1-t^2})$. If we visually inspect $S^1$ at $(0,1)$ we expect its best linear approximation line to be given by a horizontal line $y=1$ passing through $(0,1)$, call this $A$.
The recipe given in Guillemin and Pollack says that we can find this plane by calculating the Jacobian of the parametrization, $d\phi_0$, then writing $A=(0,1)+d\phi_0 T_0(-1,1).$ $d\phi_0$ is the $2\times 1$ matrix
$$ d\phi_0=\begin{bmatrix}
\frac{\partial x}{\partial t}\\
\frac{\partial y}{\partial t}
\end{bmatrix}_{t=0}.$$
This is
$$ d\phi_0=\begin{bmatrix}
1\\
0
\end{bmatrix}.$$
The moral is that the image of $T_0(-1,1)=\mathbb{R}$ is the $x-$axis.
$$ d\phi_0T_{0}(-1,1)=\{(x,y)\in \mathbb{R}^2:y=0\}.$$
Then if we add $(0,1)$ we get that $A$ is precisely the horizontal line passing through $(0,1)$. Indeed, here $T_x(X)$ is the $x-$axis, and the best linear approximation is the shifted $x-$axis.
In response to the second question, $x+T_x(X)$ is literally a shifted subspace tangent to the manifold $X$ at the point $x$. $T_x(X)$ can be visualized as $x+T_x(X)$ but shifted in an affine manner so it passes through the origin. The advantage to calling $T_x(X)$ the tangent space is that when it passes through the origin it is a bona fide linear subspace of $\mathbb{R}^n$.
Best Answer
Hint: Fix a basis $\{v_1,...,v_k\}$ of $V$ and consider the following parametrization: $\Phi:\mathbb R^k\to V$ given by $(a_1,...,a_k)\mapsto \sum_{i=1}^ka_iv_i$