Let $\mathscr F$ be a sheaf on $X$ and $\mathscr G$ a sheaf on $Y$ and let $f:X\to Y$ be continuous. Write $Pre(f^{-1}\mathscr G)$ for the presheaf $U\mapsto\varinjlim_{V\supset f(U)}\mathscr G(V)$. Then the sheaf $f^{-1}\mathscr G$ is the sheaf of continuous sections of the espace étalé $\operatorname{Spé}(Pre(f^{-1}\mathscr G))$, which is given the final topology induced by the sections of the presheaf. These sections are functions valued in the stalks of the presheaf. For $V\subset Y$, the sections $\Gamma(V,–)$ of the sheaf $f_*f^{-1}\mathscr G$ are given by continuous sections of $\operatorname{Spé}(Pre(f^{-1}\mathscr G))$ over $f^{-1}(V)$. Since the topology on $X$ is at least as fine as the initial topology induced by $f$, the topology on $\operatorname{Spé}(Pre(f^{-1}\mathscr G))$ is at least as coarse as the one induced by only the sections of the presheaf $f^{-1}(V)\mapsto\varinjlim_{V'\supset f(f^{-1}(V))}\mathscr G(V)$ for $V\subset Y$ (this is the presheaf that would arise from $f^{-1}\mathscr G$ if we had given $X$ the initial topology induced by $f$). There is a natural map $\mathscr G(V)\cong\varinjlim_{V'\supset V}\mathscr G(V')\xrightarrow{\rho}\varinjlim_{V'\supset f(f^{-1}(V))}\mathscr G(V')$, and a necessary condition for $\rho$ to be an isomorphism is for $f$ to be epi. (More generally, $\rho$ is itself epi if $\mathscr G$ is flasque.) If $s\in\mathscr G(V)$, there is continuous section of the espace étalé $t:f^{-1}(V)\mapsto\operatorname{Spé}(Pre(f^{-1}\mathscr G))$ where if $x\in f^{-1}(y)$, $t(x)=\rho(s)_y\in\operatorname{Spé}(Pre(f^{-1}\mathscr G))$. (It is continuous as $\rho(s)\in Pre(f^{-1}\mathscr G)$.) The natural map $s\mapsto t$ induces a canonical natural morphism of sheaves $\varphi:\mathscr G\to f_*f^{-1}\mathscr G$. One sees that a section $s\in\mathscr G$ is in $\ker\varphi$ iff $f^{-1}(\operatorname{supp} s)=\emptyset$. In particular, if $f$ is an epimorphism, $\varphi$ is a monomorphism. If moreover $X$ has the initial topology induced by $f$, and $f$ is an epimorphism, then $\varphi$ is an isomorphism; if $f$ is not epi but instead $f(X)\subset Y$ is open (and $X$ has the initial topology induced by $f$) then $\varphi$ induces an isomorphism of sheaves $f_*f^{-1}\mathscr G\cong\left.\mathscr G\right|_{f(X)}$.
As for stalks, given $x\in X$, $f(x)=y\in Y$, note that looking over just the stalks of the presheaf $f_*(Pre(f^{-1}\mathscr G))$, we have that for $y$ in the image of $f$ as above that
$$(f_*(Pre(f^{-1}\mathscr G)))_y
=\varinjlim_{V\ni y}\varinjlim_{V'\supset f(f^{-1}(V))}\mathscr G(V')=\mathscr G_y.$$
Since $(f_*(Pre(f^{-1}\mathscr G)))_y$ may be canonically identified with a subset of $(f_*f^{-1}\mathscr G)_y$, if $y\in\operatorname{im} f$, $\varphi_y$ is a monomorphism. This makes sense, since we know that $\varphi$ is a monomorphism iff $\varphi_y$ is mono on all stalks. Now consider general stalks of the sheaf $f_*f^{-1}\mathscr G$. They are in general much larger than stalks of the presheaf discussed above, since if, say, $\left|f^{-1}(y)\right|>1$ and there is a neighborhood $V$ of $y$ such that $f^{-1}(V)$ has one connected component $C_i\subset X$ for each point $x_i$ in $f^{-1}(y)$, then $(f_*f^{-1}\mathscr G)_y$ will be isomorphic to $\prod_i\mathscr G_y$ under a mild assumption described below, and the map $\varphi_y:\mathscr G_y\hookrightarrow (f_*f^{-1}\mathscr G)_y$ will be the diagonal embedding.
Suppose $f$ is injective and let $g\in (f_*f^{-1}\mathscr G)(V)$ for $V\ni y$. Then $f^{-1}(V)\supset \exists U\ni x=f^{-1}(y)$ such that $g(p)=t_p$ for some $t\in\varinjlim_{V'\supset f(U)}\mathscr G(V')$.* If $f(U)$ contains a neighborhood $W$ of $y$, then $\left.g\right|_{f^{-1}(W)}$ represents the same class in the stalk $(f_*f^{-1}\mathscr G)_y$ as $g$, and this class is represented by an element of $\mathscr G(W)$, as in that case, $\varinjlim_{V'\supset f(f^{-1}(W)))}\mathscr G(V')=\varinjlim_{V'\supset W}\mathscr G(V')\cong\mathscr G(W)$. So one can make the local statement that if $f^{-1}(y)=x$ and the preimage under $f$ of a local (neighborhood) system at $y$ is a local (neighborhood) base for $x$, then $(f_*f^{-1}\mathscr G)_y=\mathscr G_y$. In particular, if $f$ is an open embedding, then it induces an isomorphism on stalks between $f_*f^{-1}\mathscr G$ and $\left.\mathscr G\right|_{f(X)}$, and hence an isomorphism of sheaves $f_*f^{-1}\mathscr G\xrightarrow{\sim}\left.\mathscr G\right|_{f(X)}$.
*N.B. If the local condition above is not met, the section $t\in\varinjlim_{V'\supset f(U)}\mathscr G(V')$ may be unrecognizable, since $f(U)$ need not be open, so $t$ might be an equivalence class of sections quite different from the stalk at $y$, or anything else I can think of.
For the first question: yes, the morphism of presheaves $i$ is injective, as its kernel is zero.
For the second question: I believe that "natural" here means canonical, since it comes out of the universal property of sheafification. I realise that this is confusing since "natural" has a specific meaning regarding compatibility with functors in category theory.
Best Answer
Since you already know what a presheaf is, it may be convenient for you to see why a presheaf is a functor. Basically it amounts to the existence of restriction maps and certain conditions on them.
Take a presheaf of sets $F$ on your topological space $X$. The condition of being a presheaf is that:
These four conditions are exactly the conditions that "$FU$ is functorial on the open sets $U$". Writing $i^U_V:U\to V$ for the inclusion of $U$ into $V$, one could denote the corresponding restriction function as $\rho^V_U=F(i^U_V)$. Denoting the category of open sets of $X$ by $\text{Top}_X$ and the category of sets by $\text{Set}$, then $F$ is a functor $F:\text{Top}_X^{\text{op}}\to\text{Set}$. ($\text{op}$ because $\rho^V_U=F(i^U_V)$ goes "in the opposite direction" as $i^U_V$).
If you denote the categories of sheaves on $X$ and $Y$ by $\text{Sh}X$ and $\text{Sh}Y$, respectively, then for any continuous function $f:X\to Y$ you obtain a functor $f^{-1}:\text{Sh}Y\to\text{Sh}X$. Explicitly, this means that
So you need to check the last 3 conditions on $f^{-1}$.