There are two ways a presheaf can fail to be a sheaf.
- It has local sections that should patch together to give a global section, but don't,
- It has non-zero sections which are locally zero.
When dividing the problems into two classes, it is easy to see what sheafifying does. It adds the missing sections from the first problem, and it throws away the extra sections from the second problem.
The latter kind of sections tend to be easier to notice, but are less common. Usually, when a construction or functor must be sheafified, it has local sections that should patch together but don't.
A simple example of a presheaf with this property is the presheaf $F_{p=q}$ of continuous functions on the circle $S^1$ which have the same value at two distinct points $p,q\in S^1$. When I restrict to an open neighborhood of $p$ that doesn't have $q$, the condition on their values goes away. Because the same thing is true for open neighborhoods of $q$ which don't contain $p$, the condition on the functions in this presheaf has no effect on sufficiently small open sets. It follows that this presheaf is locally the same as the sheaf of continuous functions. Therefore, for any function on $S^1$ which has different values on $p$ and $q$, I can restrict it to an open cover where each local section is in $F_{p=q}$, but this function is not in $F_{p=q}$. This is why $F_{p=q}$ is not a sheaf.
When we sheafify, we just add in all these sections, to get the full sheaf of continuous functions. This is clear, because any two sheaves which agree locally are the same (though, I mean that the local sections and local restriction maps agree).
This example really does come up in examples. Consider the map $S^1\rightarrow \infty$, where $\infty$ is the topological space which is $S^1$ with $p$ and $q$ identified. If I pull back the sheaf of functions on $\infty$ in the naive way, the resulting presheaf on $S^1$ is $F_{p=q}$. To get a sheaf, we need to sheafify.
$\def\sh#1{\mathcal{#1}}\def\csheaf#1{\underline{#1}}\def\on#1{\operatorname{#1}}$First of all, if $X$ is an irreducible scheme (or any such topological space), then all of its open subsets are connected and there are no complications such as you describe. However, the complications dry up under close examination no matter what, because we are talking about sheaves of abelian groups versus sheaves of $\sh{O}$-modules, and thus the same gluing property that alters the sections of the constant sheaf on $\mathbb{Z}$ alters the sections of its modules in such a way that nothing goes wrong. Here are the details:
Given a sheaf of abelian groups $\sh{F}$ and a sheaf of rings $\sh{O}$, the structure of an $\sh{O}$-module on $\sh{F}$ is the same as a homomorphism of sheaves of rings $\sh{O} \to \sh{Hom}(\sh{F}, \sh{F})$, where the last object is the sheaf of group homomorphisms from $\sh{F}$ to itself. If $\sh{O}$ is the constant sheaf on a ring $R$, then it is the sheafification of the constant presheaf $\csheaf{R}$ on $R$, and by the universal property of sheafification, it suffices to give a map $\csheaf{R} \to \sh{Hom}(\sh{F}, \sh{F})$ (this extends uniquely to $\sh{O}$). But the sections of $\csheaf{R}$ are just $R$ for any open set, even disconnected ones, and so that's the same as a map $R \to \on{Hom}(\sh{F}|_U, \sh{F}|_U)$ for any open $U$, compatible with restrictions. This gives in particular a collection of maps $R \to \on{Hom}(\sh{F}(U), \sh{F}(U))$ compatible with restrictions, and these two data are equivalent since a map of sheaves is just a collection of maps of sections compatible with restrictions. Of course, the last kind of data is the same thing as an $R$-module structure on $\sh{F}$, as you want.
Best Answer
For presheaves (of sets or groups) we know what this particular (or any) colimit operation is: apply the operation objectiwise (for each $U$). Now the sheafification preserves colimits, hence we apply sheafification to a colimit cocone on presheaves to obtain a colimit cocone in sheaves. Doing sheafification to the presheaves which are alerady sheaves does nothing to them, but it, by the right exactness, does the correct thing to the colimit. This proves that the sheafification following the colimit in presheaves is the correct way to compute the colimit, and in that we did use the right exactness of the sheafification essentially. The fact that it is necessary does not follow from the general nonsense as there are both examples where we accidentally do not need a sheafification step and those where we do need. For the limit constructions on sheaves we never need the sheafification because the embedding of the sheaves into presheaves is left exact hence we can simply compute the limits in presheaves. It seems you had somehow an opposite impression.