You can apply the push forward pointwise. In fact, we define $(\phi_*v)_q$ to be $\phi_*(v_p)$, where $\phi(p) = q$. What you're asked to show is that the function $q\mapsto (\phi_*v)_q$
For the first part, you'll need to use the fact that $\phi$ is one-to-one.
The fundamental intuition is that it doesn't matter which manifold you do your calculations in, you get the same result either way.
This was already clear in the case of coordinate charts; calculus on a manifold is often defined in terms of what you get using coordinates to map the problem over to Euclidean space. The point is that the idea extends to more general manifolds than just Euclidean space.
Given a vector on $M_1$ and a scalar field on $M_2$, there are two ways you might combine them to get a directional derivative: either pull the problem back to $M_1$ or push it forward to $M_2$. The identity you cite is the one that asserts you get the same answer both ways.
Anyways, I think there is an extremely compelling algebraic rationale for this.
Suppose you are doing calculus in one variable $x$, then later you decide you need a second independent variable $y$. This changes absolutely nothing about the calculations you've done — if $y$ doesn't appear in any of your calculations, everything is as if it didn't exist at all.
I.e. $\mathrm{d} \sin(x) = \cos(x) \mathrm{d} x$ is always true; it doesn't matter whether or not you have any other variables and whether or not $x$ is dependent with any of them.
Consider the case where $\phi$ is the projection onto the first component map $\mathbb{R}^2 \to \mathbb{R}$, using standard coordinates on both.
The pullback $\phi^*$ on scalar fields is precisely the "add in the variable $y$" operation. The push forward $\phi_*$ on vectors expresses the fact only the $x$-direction matters. It's clear that if we have $v \in T\mathbb{R}^2$ and $f \in \mathcal{C}^1(\mathbb{R})$, then we expect
$$ (\phi_* v)(f) = v(\phi^* f) $$
because both formulas are expressing exactly the same operation.
Best Answer
Writing the differential of $\phi':=i_N\circ\phi\circ i_M^{-1}:\mathbb{R}^m\to\mathbb{R}^n$ in local coordinates and identify $T_pM\subset T_{i_M(p)}\mathbb{R}^m\cong\mathbb{R}^m$ and $T_{\phi(p)}N\subset T_{i_N(\phi(p))}\mathbb{R}^n\cong\mathbb{R}^n$ $$\mathrm{D}\phi'(i_M(p))=\mathrm{D}\left(i_N\circ\phi\circ i_M^{-1}\right)(i_M(p)):\mathbb{R}^m\to\mathbb{R}^n$$ one can restrict $\mathrm{D}\phi'(p)$ to $T_pM\to T_{\phi(p)}N$ to obtain $\mathrm{D}\phi(p)$.