Since everybody else is throwing derived categories at you, let me take another approach and give a more lowbrow explanation of how you might have come up with the idea of using injectives. I'll take for granted that you want to associate to each object (sheaf) $F$ a bunch of abelian groups $H^i(F)$ with $H^0(F)=\Gamma(F)$, and that you want a short exact sequence of objects to yield a long exact sequence in cohomology.
I also want one more assumption, which I hope you find reasonable: if $F$ is an object such that for any short exact sequence $0\to F\to G\to H\to 0$ the sequence $0\to \Gamma(F)\to \Gamma(G)\to \Gamma(H)\to 0$ is exact, then $H^{i}(F)=0$ for $i>0$. This roughly says that $H^{i}$ is zero unless it's forced to be non-zero by a long exact sequence (you might be able to run this argument only using this for $i=1$, but I'm not sure). Note that this implies that injective objects have trivial $H^{i}$ since any short exact sequence with $F$ injective splits.
Now suppose I come across an object $F$ that I'd like to compute the cohomology of. I already know that $H^{0}(F)=\Gamma(F)$, but how can I compute any higher cohomology groups? I can embed $F$ into an injective object $I^{0}$, giving me the exact sequence $0\to F\to I^{0}\to K^{1}\to 0$. The long exact sequence in cohomology gives me the exact sequence
$$0\to \Gamma(F)\to \Gamma(I^{0})\to \Gamma(K^{1})\to H^{1}(F)\to 0 = H^1(I^{0})$$
That's pretty good; it tells us that $H^{1}(F)= \Gamma(K^{1})/\mathrm{im}(\Gamma(I^{0}))$, so we've computed $H^{1}(F)$ using only global sections of some other sheaves. We'll come back to this, but let's make some other observations first.
The other thing you learn from the long exact sequence associated to the short exact sequence $0\to F\to I^{0}\to K^{1}\to 0$ is that for $i>0$, you have
$$H^{i}(I^{0}) = 0\to H^{i}(K^{1})\to H^{i+1}(F)\to 0 = H^{i+1}(I^{0})$$
This is great! It tells you that $H^{i+1}(F)=H^{i}(K^{1})$. So if you've already figured out how to compute $i$-th cohomology groups, you can compute $(i+1)$-th cohomology groups! So we can proceed by induction to calculate all the cohomology groups of $F$.
Concretely, to compute $H^{2}(F)$, you'd have to compute $H^{1}(K^{1})$. How do you do that? You choose an embedding into an injective object $I^{1}$ and consider the long exact sequence associated to the short exact sequence $0\to K^{1}\to I^{1}\to K^{2}\to 0$ and repeat the argument in the third paragraph.
Notice that when you proceed inductively, you construct the injective resolution
$$0\to F\to I^{0}\to I^{1}\to I^{2}\to\cdots$$
so that the cokernel of the map $I^{i-1}\to I^{i}$ (which is equal to the kernel of the map $I^{i}\to I^{i+1}$) is $K^{i}$. If you like, you can define $K^{0}=F$. Now by induction you get that
$$H^{i}(F) = H^{i-1}(K^{1}) = H^{i-2}(K^{2}) = \cdots = H^{1}(K^{i-1}) = \Gamma(K^{i})/\mathrm{im}(\Gamma(I^{i-1})).$$
Since $\Gamma$ is left exact and the sequence $0\to K^{i}\to I^{i}\to I^{i+1}$ is exact, you have that $\Gamma(K^{i})$ is equal to the kernel of the map $\Gamma(I^{i})\to \Gamma(I^{i+1})$. That is, we've shown that
$$H^{i}(F) = \ker[\Gamma(I^{i})\to \Gamma(I^{i+1})]/\mathrm{im}[\Gamma(I^{i-1})\to \Gamma(I^{i})].$$
Whew! That was kind of long, but we've shown that if you make a few reasonable assumptions, some easy observations, and then follow your nose, you come up with injective resolutions as a way to compute cohomology.
Notation: $f:X \to Y$ is the map we're pushing forward along, and $F$ is our sheaf on $X$. In general the stalks of $f_*F$ at different points will not be isomorphic. For instance if $f$ misses the point $y \in Y$ and your space is sufficiently separated then the stalk of $f_*F$ at $y$ will be 0 while it will be nonzero for points in the image.
An extreme case is when the map has image a point. Then you get a skyscraper sheaf, which is very far from constant on most spaces and most points (Note: if you're hitting the generic point of $Y$ then the direct image will in fact be constant).
Edit: Another extreme case is when $X$ is a large discrete space. Then one can get direct image sheaves where no stalk is isomorphic to any other stalk. For instance this happens if all the fibers of $f$ have different cardinalities. I think you would usually need the axiom of choice to even define such a map.
Best Answer
This answer is inspired by the Embedding Calculus (aka Manifold Calculus) of Weiss and Goodwillie. This is a framework for studying certain presheaves on manifolds. The idea is that sheafification of a presheaf is analogous to the linearization of a function. From this point of view, sheafification is just the first in a sequence of approximation - for each $n$ there is the universal approximation of degree $n$. What I am doing below is describe the difference between the quadratic and the linear approximation, which one may think of as the principal part of the difference between a presheaf and its sheafification. I am not sure if this approach is useful in the context of algebraic geometry, or for the applications that you have in mind. But let me put it out here, FWIIW.
Let ${\mathcal F}$ be a presheaf on $X$. Suppose $x$ and $y$ are two points in $X$ that can be separated by disjoint open sets. Let us define the "bi-stalk" of $\mathcal F$ at $(x,y)$ as ${\mathcal F}_{(x,y)}=$colim$_{U,V} \mathcal F(U\cup V)$, where $(U, V)$ range over pairs of disjoint neighborhoods of $x$ and $y$. There is a natural homomorphism from the bi-stalk to the product of stalks ${\mathcal F}_{(x,y)}\to {\mathcal F}_{x}\times {\mathcal F}_{y}$. If $\mathcal F$ is a sheaf then this homomorphism is an isomorphism. So you have a homomorphism for each such pair that measures the failure of $\mathcal F$ to be a sheaf.
Here is a perhaps slightly more sophisticated version of this idea. We can use $\mathcal F$ to define some new presheaves on $X\times X$. We will define them on basic sets of the form $U\times V$. There is an evident diagram of presheaves
$$\begin{array}{ccc} {\mathcal F}(U \cup V) & \to & {\mathcal F}(U)\\ \downarrow & & \downarrow \\ {\mathcal F}(V) & \to & {\mathcal F}(U\cap V) \end{array}$$
If $\mathcal F$ is a sheaf then this is a pullback square for every $U, V$. Define $\mathcal F_2$ to be the
total homotopy fiber of this squarehomotopy fiber of the homomorphism from the initial corner to the pullback of the rest. We may want to think of $\mathcal F_2$ as a presheaf of chain complexes on $X\times X$. The cohomology of the associated sheaf is an invariant that measures the deviation of $\mathcal F$ from being a sheaf (roughly speaking - see next paragraph). If this invariant vanishes, one can construct similar invariants of higher order by looking at higher "cross-effects" of $\mathcal F$.In fact, the restriction of $\mathcal F_2$ to the diagonal is trivial, we really want to consider cohomology relative to the diagonal. Also, there is a $\Sigma_2$ symmetry to this set-up, and we probably want to consider equivariant cohomology.