That categorical definition is for pre-sheaves, the topological definition is for sheaves.
In topological pre-sheaves, a map is surjective if it is epimorphic for each open set $U$ in $X$.
In topological sheaves, however, we instead have to "sheaf-ify" the definition, and we say that the map is "surjective" if the sheaf-ification of the cokernel map is zero.
Basically, in both cases, you have two categories, $\mathcal{Sh}$ and $\mathcal{PSh}$, and in $\mathcal{PSh}$, the "surjective" maps are the ones that are epimorphisms on each $U$, but in the $\mathcal{Sh}$ catageory, you have a more complicated definition of "surjective" (or "epimorphism.")
Consider, instead, two categories, $\mathcal{Ab}$ the category of abelian groups, and $\mathcal{AbTF}$, the full subcategory of "torsion-free" abelian groups - that is, the abelian groups, $A$, where for any $n\in\mathbb Z$ and $a\in A$, $na=0$ iff $n=0$ or $a=0$.
There is the natural inclusion functor $\mathcal{AbTF}\to\mathcal{Ab}$ and a natural adjoint sending $A\to A/N(A)$ where $N(A)$ is the subgroup of nilpotent elements of $A$.
But in $\mathcal{AbTF}$, the "epimorphisms" are not the ones with cokernel (in $\mathcal{Ab}$) $0$, they are the ones with cokerkels which are nilpotent. So, for example, in $\mathcal{Ab}$, the morphism $\mathbb Z\to\mathbb Z$ sending $x\to 2x$ is not an epimorphism, that same map, when considered as a map in $\mathcal{AbTF}$, is an epimorphism.
So consider the "sheafification" functor $\mathcal{PSh}\to \mathcal{Sh}$ to be much like the functor $\mathcal{Ab}\to\mathcal{AbTF}$.
(I believe, but don't quote me, that $f:A\to B$ in $\mathcal{AbTF}$ is an epimorphism if and only if $f\otimes \mathbb Q:A\otimes \mathbb Q\to B\otimes\mathbb Q$ is an epimorphism in $\mathcal{Ab}$.)
The general result is as follows (which can be found in almost every category theory textbook): Let $\mathcal{D}$ be a reflective subcategory of a category $\mathcal{C}$, i.e. the inclusion has a left adjoint $L : \mathcal{C} \to \mathcal{D}$. Then, if a diagram $(X_i)$ in $\mathcal{D}$ has a colimit $\mathrm{colim}_i X_i$ in $\mathcal{C}$, then $L(\mathrm{colim}_i X_i)$ is its colimit in $\mathcal{D}$. The proof is just one line: For $T \in \mathcal{D}$ we have natural bijections
$$\hom(L(\mathrm{colim}_i X_i),T) \cong \hom(\mathrm{colim}_i X_i,T) \cong \lim_i \hom(X_i,T). ~~ \square$$
Also, it is well-known that left adjoints only have to be defined on objects; the action on morphisms follows from the universal property. Specifically, for $X \in \mathcal{C}$ we have a universal morphism $X \to L(X)$ with $L(X) \in \mathcal{D}$, and a morphism $f : X \to X'$ is mapped to the unique(!) morphism $L(X) \to L(X')$ such that the evident square commutes.
Best Answer
I have three suggestions:
Mac Lane, S., and Moerdijk, I., "Sheaves in Geometry and Logic: A First Introduction to Topos Theory"
Kashiwara, M., and Schapira, P., "Categories and Sheaves"
The first is my favorite. The latter is more advanced, and doesn't really start talking about sheaves until late in the book. It's a quality text nonetheless.
Finally, Angelo Vistoli's notes on descent theory have a nice discussion of sheaves (with algebraic geometry in mind) in the second chapter.