I have seen in many textbooks that the pullback of an arbitrary tensor field of type (r,s) under the diffeomorphism $\phi:M \rightarrow N$ is defined as
$\phi^* T(\eta_1,\dots, \eta_r, X_1, \dots, X_s) = T( (\phi^{-1})^*(\eta_1), \dots, (\phi^{-1})^*(\eta_r), \phi_* X_1, \dots, \phi_* X_s)$
where $\eta_i \in T_p^*(M)$ is a covector and $X_j \in T_p(M)$ is a vector. So, in the case of the metric tensor this would reduce to the following:
$\phi^*g(X,Y) = g(\phi_*X, \phi_*Y)$
where $X,Y$ are vectors.
Now, at the same time, Wikipedia suggests that we could find the pullback of such a tensor as
$\phi^*g(X,Y) = g(\phi X, \phi Y)$
and my question is how do they get $\phi_*$ to become $\phi$?
Best Answer
If $\phi:V \to W$ is a linear map, then for any $p\in V$, the tangent mapping/pushforward mapping $T\phi_p$ or $d\phi_p$ or $\phi_{*,p}$ (however you want to use the notation) is a linear map $T_pV \to T_{\phi(p)}W$. But for a vector space, the tangent space can be canonically identified with itself: $T_pV \cong V$ and $T_{\phi(p)}W\cong W$. Because of this, you can "think of" the tangent mapping as a map $V \to W$. This is simply the derivative of a linear transformation $\phi:V \to W$ at the point $p \in V$. But a linear transformation is its own derivative.
If you want a more precise formulation of what I said above, here it is: on any (say finite-dimensional) vector space $V$, and any $p \in V$, there is a canonical isomorphism $\xi_{V,p}:T_pV \to V$. Note that the exact construction of this isomorphism will depend on which definition of tangent space you're using, but in any case, it is a good idea to prove this yourself. Similarly we have an isomorphism $\xi_{W,\phi(p)}:T_{\phi(p)}W \to W$. If you unwind the definitions of everything, you'll see that the following diagram commutes:
$\require{AMScd}$ \begin{CD} T_pV @>{\phi_{*p}}>> T_{\phi(p)}W \\ @V{\xi_{V,p}}VV @VV{\xi_{W,\phi(p)}}V \\ V @>>{D\phi_p = \phi}> W \end{CD} In other words, $\phi = \xi_{W,\phi(p)} \circ \phi_{*,p} \circ (\xi_{V,p})^{-1}$, or said differently once again, up to isomorphisms, for each $p \in V$, we have $\phi_{*,p} = \phi$. But all of this is only because $\phi$ is a linear transformation.
But in the general case, if you have smooth manifolds $M,N$, and you have a metric tensor $g$ on $N$ and a diffeomorphism $\phi:M \to N$, there is no reason to even expect that $M,N$ have vector space structures, so it doesn't even make sense to talk about $\phi$ being linear. This is why we have to use the push-forward map, and there is no sense in which we can "identify" the push-forward with the original map itself.
See this for a more general perspective of everything I mentioned here (with slightly different notation).
Edit: In response to comment.
The author DOES NOT say $(s^{-1})^*g(x,y) = g(s^{-1}(x), s^{-1}(y))$. He says \begin{align} d_{B^n}(x,y) &= [(s^{-1})^*d_{\mathcal{H}^n}](x,y) = d_{\mathcal{H}^n}(s^{-1}(x), s^{-1}(y)) \end{align} These are completely different statements. Note that if you have two (let's for simplicity say simply connected) Riemannian manifolds $(M,g)$ and $(N,h)$. Then, the metric tensors $g$ and $h$ give rise to distance functions $d_g$ and $d_h$ respectively (in the article, the author refers to these as $d_{B^n}$ and $d_{\mathcal{H}^n}$). Now, suppose we have a diffeomorphism $\phi:M \to N$. Then, we can consider the following objects:
Note that although we are using the same notation $(\phi^{-1})^*$, and calling both of them "pullbacks", these are completely different things. The first is a pullback of tensor field, while the second is a pull-back of a distance function. The word "pull-back" should be thought of literally as the name suggests: you have a certain object defined on one space (eg. a tensor field or distance function), and you have a invertible map between two spaces. Then, you can use this map to "transport" this object to the new space.
Now, here is a theorem which you should try to prove (it is really just an exercise in unwinding all the definitions).
What this says is that if your metric tensors are related to each other by a pullback, then so are the associated distance functions. Note that this is precisely what the author is saying in the first sentence of his proof: