Here are a few quick commnents. Let $T$ be the group of translations and $G = {\rm GL}(V)$, and note that their intersection is trivial. You can show geometrically that every affine transformation can be decomposed (uniquely) as a composite of a linear map fixing the origin followed by a translation.
So all elements have a decomposition as $tg$ with $t \in G$ and $g \in G$, and hence the affine group $A$ is a product $TG$ with $T \cap G = \{1\}$. Now you can check that $T$ is a normal subgroup of $A$ but $G$ is not. This is exactly when we get a semidirect product. For a direct product you would need both $T$ and $G$ to be normal in $A$.
The multiplication rule they give is just the way it happens to be. But you could equally well write the elements as $(v,M)$ rather than as $(M,v)$, which I find more natural, because it fits better with the decomposition $gt$. The multiplication would then work out as $(v_1,M_1)(v_2,M_2) = (v_1+M_1v_2,M_1M_2)$, which I find more transparent, because it is the $M_1$ moving rightwards past the $v_2$ that is causing the $v_2$ to get replaced by $M_1v_2$ in the product. This is the way multiplication works in semidirect prodcuts in general.
The assumption on the difference map can be re-written as follows. For all $x,y,z \in X$
$$d(y,z) = d(x,z) - d(x,y).$$
Using this, we have that for all $\hat{x}$
$$d(x+v, x+u) = d(\hat{x},x+u)-d(\hat{x},x+v).$$
In particular, we can pick $\hat{x}= x$ and get that
$$d(x+v,x+u) = d(x,x+u) - d(x,x+v) = u-v \in V.$$
(Note that if we let $v = 0$, we see that for all $u \in V$ we have that $u \in d(X' \times X')$. Thus, $V \subset d(X' \times X')$ and we also have the other inclusion since $u-v \in V$.)
From this, we can easily that $x+V$ is a vector space. Specifically, let $x+v_1, x+v_2, x+v_3, x+v_4 \in x+V$ for $v_1, v_2, v_3,v_4 \in V$. Then we have that since $V$ is a vector space it contains $v_2+v_4$, $v_1+v_3$ and $v_2+v_4-(v_1+v_3)$.
Thus, we have that
$$d(x+v_1,x+v_2)+d(x+v_3,x+v_4) = v_2-v_1+v_4-v_3 = v_2+v_4 - (v_1+v_3) = d(x+v_1+v_3, x+v_2+v_4).$$
Similarly
$$ad(x+v_1, x+v_2) = d(x+av_1, x+av_2).$$
Thus, not only is it a vector space, but addition and multiplication function as you would expect them to.
As you've already noted for the second condition, we only need to check bijectivity. Let us fix $x+v \in X'$.
For injectivity, assume that there exists $x+v_1, x+v_2 \in X'$ such that
$$d(x+v, x+v_1) = d(x+v, x+v_2).$$
Then, we have that this implies that $v_1 - v = v_2 - v$ which implies that $v_1 = v_2$. Or we just note that $x+v \in X$ and get infectivity from the difference function on $X$.
For surjectivity, let $u \in d(X' \times X') = V$. Let $w = u + v \in V$. Thus, $x + w \in x+V$. Finally $d(x+v, x+w) = w-v = u+v-v = u$. Thus, the map is surjective.
Best Answer
First point
This is correct.
As you correctly pointed out, if $f$ is linear, then $f(x)=Ax$ for some $n\times m$ matrix $A$.
In this case we have \begin{align} f(\gamma v_1+v_2) & = A(\gamma v_1+v_2) \\ & = A(\gamma v_1)+Av_2 \\ & = \gamma Av_1+Av_2 \\ & = \gamma f(v_1)+f(v_2). \end{align}
Hint for the second point
Consider $f\colon\mathbb{R}^3\to \mathbb{R}^3$ defined by $f(x)=Ax$ where $$ A= \begin{pmatrix} k & 0 & 0 \\ 0 & k & 0\\ 0 & 0 & k \end{pmatrix} $$ and $k\neq 0,1$.
Will this preserve length? Where will the vector $$\begin{pmatrix}1 \\ 0 \\0\end{pmatrix}$$ get mapped? And what will the length of its image be?
Third point
Provided $A$ is invertible, yes, the affine function will be invertible with the inverse you described.