The theory of modular forms arose out of the study of elliptic integrals (as did the theory of elliptic curves, and much of modern algebraic geometry, and indeed much of modern mathematics). People understood that (complete) elliptic integrals (which we would think of as the number obtained by integrating a de Rham cohomology class, e.g. the one associated to the holomorphic differential on an elliptic curve, over a homology class on the curve) depended on an invariant (what we would think of as the $j$-invariant of the elliptic curve, although historically people used other invariants, often depending on some auxiliary
level structure, such as $\lambda$, or $k$ (the square-root of $\lambda$)). This invariant was called the modulus (which is the origin of the adjective modular in this context).
People knew that if you replaced an elliptic curve by an $N$-isogenous one,
then the elliptic integral would be multiplied by $N$ (in terms of $\mathbb C/\Lambda$, the elliptic integral is just one of the basis elements for $\Lambda$,
and multiplying this by $N$, while keeping the other one fixed, gives a new elliptic curve related to the original one by an $N$-isogeny). They asked themselves how they could describe the modulus for this $N$-isogenous elliptic curve (or integral) in terms of the original one. This led them to find explicit equations for the modular curves $X_0(N)$ (for small values of $N$).
With these kinds of investigations (and remember, these were brilliant people --- Jacobi, Kronecker, Klein, just to mention some spanning a good part of the 19th century), it was natural that they were led to modular forms as well as modular functions (as one example, the Taylor coefficients of elliptic functions give modular forms; as another, the coordinates --- say with respect to Weierstrass elliptic functions --- of $N$-torsion points give level $N$ modular forms).
So all these investigations grew out of the study of elliptic integrals, but became intimately connected with the invention of algebraic topology, the development of complex analysis (by Riemann, and then Schwarz, and then the uniformization theorem), the development of hyperbolic geometry; basically all
the fundamental mathematics of the 19th century that then drove much of the developments of 20th century mathematics.
The connections with arithmetic were also observed early on. Jacobi already introduced theta series and saw the relationship with counting representations by quadratic forms (e.g. he proved that the number of ways of writing $n \geq 0$ as a sum of four squares is equal to $\sum_{d | n, 4 \not\mid d} d$, using weight $2$ modular forms on $\Gamma_0(4)$).
But Kronecker (and maybe Abel, Eisenstein and even Gauss before him) also knew that modular forms, when evaluated at CM elliptic curves (i.e. at quadratic imaginary values of $\tau$) gave algebraic number values in some contexts. Gauss was led to this by the analogy with cyclotomy: $N$-torsion on an elliptic curve was analogous to $N$th roots of $1$ on the unit circle, and the analogy is tighter when the elliptic curve has CM, because then the $N$-torsion points become a cyclic module over the ring of CMs, just as the $N$th roots of $1$ are a cyclic module over $\mathbb Z$ (i.e. a cyclic group).
Kronecker (and again, maybe people before him) realized that CM elliptic curves corresponded to lattices $\Lambda \subset \mathbb C$ that belong to ideal classes in quadratic imaginary fields, and so saw a relationship between CM elliptic curves and class field theory for quadratic imaginary fields (Kronecker's Jugendtraum). This also related to the previous work on evaluating modular forms at CM points.
All this is just to say that even in the 19th century the subject was very deep, and already very connected to number theory, as well as everything else.
Ramanujan knew the theory very well, and discovered new phenomena (e.g. his conjectures on the behavious of $\tau(n)$, defined by $\Delta = q\prod_{n=1}^{\infty} (1- q^n)^{24} = \sum_{n=1}^{\infty} \tau(n) q^n$). Mordell proved Ramanujan's conjecture on the multiplicative nature of $\tau$, and Hecke introduced his operators to systematize Mordell's method of proof.
At this point, the subject moved in a more representation-theoretic and analytic direction, with the generalization to automorphic forms. With the discovery in the 50s, 60s, and 70s of the modularity conjecure for elliptic curves over $\mathbb Q$, and related ideas, the arithmetic theory of modular forms became a central topic again. See this answer on MO for more on that.
Mazur's theorem on torsion points on elliptic curves over $\mathbb Q$ is one of the deepest results that comes from thinking of $X_0(N)$ and $X_1(N)$ directly in modular terms. But already the proofs are more automorphic in nature, and are focussed on the relationships between modular forms, particularly Hecke eigenforms, and Galois representations. That's where the modern focus primarily lies. You can see some of the other answers linked from my webpage (here) for more on that.
Let me close this long discussion by just saying that the passage to Galois representations as a focus is a natural development from Kronecker's Jugendtraum, but reflects a shifting of attention from abelian class field theory for quadratic imaginary fields to non-abelian (more precisely, $\mathrm{GL}_2$) class field theory for $\mathbb Q$. (Note that the former embeds in the latter, since the indcution of a Galois character of a quadratic extension gives a two-dimensional rep. of $G_{\mathbb Q}$.)
Finally, let me mention that the main theme of Mazur's article is congruences between cuspforms and Eisenstein series (this is what the Eisenstein ideal measures), and so it's hard to have one without the other. (In some sense, Eisenstein series are like the trivial Dirichlet character mod $N$, while cuspforms are like the non-trivial characters. Which is more important depends on what you are doing; in many problems you need to consider both.)
Yes, the statement about monomorphisms is true on any category. Your proof is correct.
As you said, the statement for epimorphisms is not always true. For example, in the category of Hausdorff topological spaces, let $b$ have dense image but not surjective (this is an epimorphism) and let $a$ have image contained in the complement of the image of $b$. Then the fiber product $A\times_{C}B$ is empty, so $p$ won't be an epimorphism unless $A$ was empty.
In abelian categories, pullbacks of epimorphisms are always epimorphisms. More generally, the notion you need is that of a regular category on which every epimorphism is regular (i.e. the coequalizer of some pair of morphisms). In a regular category, regular epimorphisms always pull back to regular epimorphisms by definition.
Besides abelian categories, the category of sets is also regular. Moreover, all epis of sets are regular. This explains your last remark about the category Set.
Best Answer
Let me elaborate on the comment above and turn it into an answer. There are many ways to define meromorphic differentials, maybe the simplest of which is "rational sections of the cotangent bundle". To be more precise, if $V \subseteq X$ is an open subset, and $U \subseteq V$ is such that $V \setminus U$ consists of isolated points, then $\omega \in \Omega^1_{hol}(U)$ is a meromorphic differential on $V$ if it has poles along $V \setminus U$. Alternatively, one could simply define a meromorphic differential form on $V$ using charts - defining meromorphic differential forms on $\mathbb{C}$ as $ f(q) dq$ with $f$ meromorphic, and say that $\omega$ is a differential form on $V$ if for any chart $\phi : W \rightarrow V \subseteq \mathbb{C}$, $\phi^{*} \omega$ is meromorphic. This is the approach taken in Diamond and Shurman.
The space denoted $\Omega^{\otimes n}(X)$ is not the $n$-th symmetric power, but the $n$-th tensor power. When we have two line bundles, we can tensor them to obtain a new line bundle. We do that $n$ times for the line bundle of meromorphic differentials. In more explicit notation $$ (dq)^n = (dq)^{\otimes n} = (dq)\otimes \ldots \otimes (dq) $$ If $\varphi : X \rightarrow Y$ is holomorphic, then pulling back satisfies $$ \varphi^*(\omega_1 \otimes \ldots \otimes \omega_n) = \varphi^*(\omega_1) \otimes \ldots \otimes \varphi^*(\omega_n) $$ and we obtain \begin{align*} \varphi^*(f(q_2)(dq_2)^{\otimes n}) &= \varphi^*(f(q_2)dq_2) \otimes (\varphi^*(dq_2))^{\otimes n-1} \\ &= (f(\varphi(q_1)) \varphi'(q_1) dq_1) \otimes (\varphi'(q_1)dq_1)^{\otimes n-1} \\ &= f(\varphi(q_1))(\varphi'(q_1))^n (dq_1)^{\otimes n} \end{align*} where here we have used the chain rule to pull back each differential.