The proof requires two steps, both steps being separate $h$-principles; one is Smale-Hirsch theorem as you said and the second is Gromov's $A$-directed embedding theorem.
By restricting your embedding $V \times \Bbb R \to W \times \Bbb R$ to $V \times 0$, you have an embedding $V \hookrightarrow W \times \Bbb R$ and as you concluded, you would like to position this embedded copy of $V$ such that restriction of the projection $W \times \Bbb R \to W$ gives an immersion $V \looparrowright W$.
Let $k = \dim V, l = \dim W$. Let $A \subset \mathrm{Grass}(k, W \times \Bbb R)$ be open subset of the Grassmannian of $k$-planes consisting of non-vertical planes, i.e., the planes which intersects the vertical lines $\{w\} \times \Bbb R$ transversely for all $w \in W$. You would like to homotope $V \hookrightarrow W \times \Bbb R$ to an embedding whose differential lands in $A$.
There is no obstruction at the level of differentials; given the tangent $k$-plane field along $V$ in $W \times \Bbb R$, one can always nudge the plane field to one which lands in $A$; this is because $k < l$, so there are at least two degrees of freedom for a ($k$-dim) tangent plane along $V$ in $W \times \Bbb R$ (which is $l+1$-dim) to move, since $(l+1) - k \geq 2$ so that we can simply rotate.
There is one more condition for the $h$-principle to go through; $A$ needs to be a complete subset of the Grassmannian, see Ch 4.6 in Eliashberg-Mishachev. This can be easily checked in the case of the above subset.
Given all of this, one can apply Gromov's theorem to homotope the embedding to be "$A$-directed", i.e., find a new embedding $V \hookrightarrow W \times \Bbb R$ whose differential lands in $A$. This concludes the second step in the proof of the exercise.
For a visually transparent proof of this corollary of the $A$-directed embedding theorem see Rourke-Sanderson, "The compression theorem I" (II and III are also beautiful papers which I highly recommend)
Note that what you wrote is a definition. For an intuition think that the $x_i$ live on the manifold, whereas the $r_i$ live on $\mathbb{R}$. You want to define the partial derivative on a Manifold by going back to the already defined partial derivative on $\mathbb{R}^n$. Therefore you need to change the coordinates from coordinates on $M$ to coordinates on $\mathbb{R}$ and at the same time the function needs to be in $\mathbb{R}^n$, so naturally you consider $f\circ \phi^{-1}$.
Best Answer
Just to elaborate on Xiao's answer; considering the fact that we refer to tangent vectors derivations then the differential or push forward map;
$$[f_{*,p}(X_p)] g = X_p( g \circ f); \ X_p \in T_pM, g \in C_{f(p)}^{\infty}(N), f: M \to N$$
here $X_p( g \circ f) \in T_{f(p)}N$ (why?). Well if you take $g,h \in C_{f(p)}^{\infty}(N)$ then;
$$[f_{*,p}(X_p)] (gh) = X_p(gh \circ f) = X_p(g \circ f \cdot h \circ f)$$
and now since $X_p$ is a derivation;
$$X_p(g \circ f \cdot h \circ f) = X_p(g \circ f) \cdot h(f(p)) + g(f(p)) \cdot X_p(h \circ f) $$
$$ \hspace{1.2in}= [f_{*,p}(X_p)]g \cdot h(f(p)) + g(f(p)) [f_{*,p}(X_p)] h$$
The linearity piece is also clear since $X_p$ is linear. Therefore; if you take $f: M \to \mathbb{R}$ and $(U,x^1,...,x^d)$ to be a chart about $p$ then;
$$\left\{\frac{\partial}{\partial x^1}\Bigr|_p,...,\frac{\partial}{\partial x^d}\Bigr|_p\right\}$$
is a basis for $T_pM$. Similarly, we can use the coordinate $t$ to parametrize a neighborhood of $f(p) \in \mathbb{R}$ and so $T_{f(p)}\mathbb{R}$ has basis vector;
$$\frac{\partial}{\partial t}\Bigr|_{f(p)} := \frac{d}{dt}\Bigr|_{f(p)}$$
Since $f_{*,p}$ is linear, it maps tangent vectors to tangent vectors i.e;
$$f_{*,p}\left(\frac{\partial}{\partial x^i}\right) = \alpha \frac{d}{dt}\Bigr|_{f(p)}$$
If we evaluate both sides at $t$ and use the definition of the differential we have;
$$f_{*,p}\left(\frac{\partial}{\partial x^i}\right) t = \alpha \frac{d}{dt}\Bigr|_{f(p)} t \Rightarrow \frac{\partial}{\partial x^i}\Bigr|_p (t \circ f) = \frac{\partial}{\partial x^i}\Bigr|_p f = \alpha$$
The above follows from the fact that the coordinate function $t$ picks out the first coordinate of the map $f$, which is real-valued, so that if just $f$. It now follows that;
$$f_{*,p}\left(\frac{\partial}{\partial x^i}\right) =\frac{\partial}{\partial x^i}\Bigr|_p f \frac{d}{dt}\Bigr|_{f(p)}$$