[Math] How were modular forms discovered

ho.history-overviewmodular-formsnt.number-theory

When modular forms are usually introduced, it is by: "We have the standard action of $SL(2,\mathbb Z)$ on the upper half-plane, so let us study functions which are (almost) invariant under such transformations".
But modular forms were discovered in the 19th century, before group theory was available. Therefore, I'd like to ask how were modular forms originally discovered?
In his book The 1-2-3 of modular forms D. Zagier writes that this was the reason:

Proposition 21 Let $f(z)$ be a modular form of weight $k$ on some group $\Gamma$ and $t(z)$ a modular function with respect to $\Gamma$. Express $f(z)$ locally as $\Phi(t(z))$. Then the function $\Phi(t)$ satisfies a linear differential equation of order $k+1$ with algebraic coefficients

After the proof (actually, 3 of them) there is an example: The function $\sqrt[4] E_4$ is formally a mod. form of weight one, which means:

$\sqrt[4] E_4(z)=F(\frac 1 {12}, \frac 5 {12}, 1; 1728/j(z))=1+\frac {60} {j(z)}+…$

He states that this is a classical identity found in the works of Fricke and Klein. This, however, sparks a few questions:
1) I know that hypergeometric series were investigated in that time, but how did they find this one and similar ones?
2) Did they find something special about the resulting functions, like the $SL(2,\mathbb Z)$ transformation law?

Best Answer

You don't need the language of group theory to talk about some aspects of groups. For example, number theorists going back to Fermat were studying the group of units mod $m$ (including things like the order of a unit mod $m$) and geometers were studying groups of motions in space long before anyone defined a "group".

The first modular forms (of level $4$, not level $1$) were found by Gauss in his work on the arithmetic-geometric mean around 1800, and it would take until the end of the 19th century for the term "modular form" to be introduced, in 1890. I provided links for this near the end of my answer here, but I don't want to discuss this further.

Instead I want to show how one reason for interest in modular forms in the 19th century was the construction of nonconstant meromorphic functions on Riemann surfaces of higher genus. This will explain why someone might care about functions satisfying a transformation rule like $f((a\tau+b)/(c\tau+d)) = (c\tau+d)^kf(\tau)$.

Compact Riemann surfaces of genus 1 arise in the form $\mathbf C/L$ where $L$ is a discrete subgroup of $\mathbf C$. Meromorphic functions on $\mathbf C/L$ are the elliptic functions, which in various forms were studied throughout much of the 19th century (by Jacobi, Weierstrass, et al.). We'd like to find a similar story for compact Riemann surfaces of genus greater than $1$. By letting a discrete group $Γ \subset {\rm SL}_2(\mathbf R)$ act on the upper half-plane $\mathfrak h$ by linear fractional transformations, the coset space $\Gamma\setminus\mathfrak h$ will be a compact Riemann surface (with finitely many points missing) and the higher-genus Riemann surfaces arise in this way. A natural collection of discrete subgroups of ${\rm SL}_2(\mathbf R)$ is ${\rm SL}_2(\mathbf Z)$ and its finite-index subgroups, such as the congruence subgroups. When I write $\Gamma$ below, you could consider it to be one of these groups of integral matrices.

How can we create nonconstant meromorphic functions on $\Gamma\setminus\mathfrak h$? That is the same thing as creating nonconstant meromorphic functions $F : \mathfrak h \rightarrow \mathbf C$ that are $\Gamma$-invariant: $F(\gamma\tau) = F(\tau)$ for all $\gamma \in \Gamma$ and $\tau \in \mathfrak h$. If we aren't clever enough to be able to write down $\Gamma$-invariant functions directly, we can still make progress by finding some non-invariant functions as long as they are non-invariant in the same way: if $$ f\left(\frac{aτ + b}{cτ + d}\right) = (cτ + d)^kf(τ) $$ and $$ g\left(\frac{aτ + b}{cτ + d}\right) = (cτ + d)^kg(τ) $$ for some "weight" $k \in \mathbf Z$ and all $(\begin{smallmatrix} a& b\\c &d\end{smallmatrix}) ∈ \Gamma$ and $\tau ∈ \mathfrak h$, then the ratio $f(τ)/g(τ)$ is $\Gamma$-invariant: $$ \frac{f((a\tau+b)/(c\tau+d))}{g((a\tau+b)/(c\tau+d))} = \frac{(cτ + d)^kf(τ)}{(cτ + d)^kg(τ)} = \frac{f(\tau)}{g(\tau)}. $$ As long as $f$ and $g$ are not constant multiples of each other (essentially, the space of modular forms $f$ and $g$ live in is more than one-dimensional), the ratio $f/g$ will be a nonconstant $\Gamma$-invariant function.

But why should we use fudge factors of the form $(cτ + d)^k$? Suppose for a function $f : \mathfrak h \rightarrow \mathbf C$ that $f(γτ)$ and $f(τ)$ are always related by a nonvanishing factor determined by $γ ∈ Γ$ and $τ ∈ \mathfrak h$ alone, not by $f$: $$ f(γτ) = j(γ, τ)f(τ) $$ for some function $j : \Gamma × \mathfrak h \rightarrow \mathbf C^\times$. If also $g(γτ) = j(γ, τ)g(τ)$ then $f(γτ)/g(γτ) = f(\tau)/g(\tau)$, so $f/g$ is $\Gamma$-invariant. We want to figure out how anyone might imagine using the fudge factor $j(\gamma,\tau) = (c\tau+d)^k$, where $γ = (\begin{smallmatrix} a& b\\ c& d\end{smallmatrix})$. Since $(γ_1γ_2)τ = γ_1(γ_2τ)$, we have $f((γ_1γ_2)τ) = f(γ_1(γ_2τ))$. This turns the above displayed equation into $$ j(γ_1γ_2,τ)f(τ) = j(γ_1,γ_2τ)f(γ_2τ). $$ Since $f(γ_2τ) = j(γ_2,τ)f(τ)$, the above equation holds if and only if the "cocycle condition" $$j(γ_1γ_2, τ) = j(γ_1,γ_2τ)j(γ_2,τ)$$ holds, which looks a lot like the chain rule $(f_1 ◦ f_2)'(x) = f_1'(f_2(x))f_2'(x)$. This suggests a natural example of the function $j(\gamma,\tau)$ using differentiation: when $γ = (\begin{smallmatrix} a& b\\ c& d\end{smallmatrix})$ and $\tau \in \mathfrak h$, set $$ j(γ, τ ) := \left(\frac{aτ + b}{cτ + d}\right)' = \frac{a(cτ + d) − c(aτ + b)}{(cτ + d)^2} =\frac{ad − bc}{(cτ + d)^2}, $$ and for $γ ∈ {\rm SL}_2(\mathbf R)$ this says $j(γ, τ) = 1/(cτ + d)^2$. When $j(γ,τ)$ satisfies the cocycle condition, so does $j(γ,τ)^m$ for each $m \in \mathbf Z$, which motivates the consideration of the transformation rule for modular forms with factors $(cτ + d)^k$, at least for even $k$ (use $k = -2m$).

The top answer at https://math.stackexchange.com/questions/312515/what-is-the-intuition-between-1-cocycles-group-cohomology shows how the same cocycle condition arises when trying to write down line bundles on an elliptic curve over $\mathbf C$.

Related Question