It's interesting that the coprimality of Fermat numbers was already known in
Goldbach's time. The reason for attributing the proof to Polya is presumably
that such a proof is indicated as an exercise in Polya and Szego (1924). Because of this, Ribenboim, in his Little Book of Big Primes calls it "Polya's proof." Maybe the rumor started there.
[Added later] In the light of the comments that have come in, it now looks to me
as though 1. Goldbach could have observed that he had a proof of the infinitude of
primes, but didn't care to mention it, and 2. that the attribution of this observation
to Polya starts with Hardy.
Re 1. In the 18th century, were people interested in finding new proofs of the
infinitude of primes? For example, when Euler proved that $\Sigma 1/p=\infty$
(paper E72 in the Euler Archive) he did not remark that this gives a new proof of
the infinitude of primes. It could very well be that Goldbach did not consider it
interesting to prove again that there are infinitely many primes.
Re 2. One should bear in mind that Hardy knew Polya well. Polya visited him in England just after the publication of Polya & Szego and collaborated with him on the book Inequalities, published in 1934 ( four years before H&W). So Hardy could well have learned the proof directly from Polya.
EDIT. Here is the part of the answer that has been rewritten:
We give below a short proof of the Fundamental Theorem of Galois Theory (FTGT) for finite degree extensions. We derive the FTGT from two statements, denoted (a) and (b). These two statements, and the way they are proved here, go back at least to Emil Artin (precise references are given below).
The derivation of the FTGT from (a) and (b) takes about four lines, but I haven't been able to find these four lines in the literature, and all the proofs of the FTGT I have seen so far are much more complicated. So, if you find either a mistake in these four lines, or a trace of them the literature, please let me know.
The argument is essentially taken from Chapter II (link) of Emil Artin's Notre Dame Lectures [A]. More precisely, statement (a) below is implicitly contained in the proof Theorem 10 page 31 of [A], in which the uniqueness up to isomorphism of the splitting field of a polynomial is verified. Artin's proof shows in fact that, when the roots of the polynomial are distinct, the number of automorphisms of the splitting extension coincides with the degree of the extension. Statement (b) below is proved as Theorem 14 page 42 of [A]. The proof given here (using Artin's argument) was written with Keith Conrad's help.
Theorem. Let $E/F$ be an extension of fields, let $a_1,\dots,a_n$ be distinct generators of $E/F$ such that the product of the $X-a_i$ is in $F[X]$. Then
the group $G$ of automorphisms of $E/F$ is finite,
there is a bijective correspondence between the sub-extensions $S/F$ of $E/F$ and the subgroups $H$ of $G$, and we have
$$
S\leftrightarrow H\iff H=\text{Aut}(E/S)\iff S=E^H
$$
$$
\implies[E:S]=|H|,
$$
where $E^H$ is the fixed subfield of $H$, where $[E:S]$ is the degree (that is the dimension) of $E$ over $S$, and where $|H|$ is the order of $H$.
PROOF
We claim:
(a) If $S/F$ is a sub-extension of $E/F$, then $[E:S]=|\text{Aut}(E/S)|$.
(b) If $H$ is a subgroup of $G$, then $|H|=[E:E^H]$.
Proof that (a) and (b) imply the theorem. Let $S/F$ be a sub-extension of $E/F$ and put $H:=\text{Aut}(E/S)$. Then we have trivially $S\subset E^H$, and (a) and (b) imply
$$
[E:S]=[E:E^H].
$$
Conversely let $H$ be a subgroup of $G$ and set $\overline H:=\text{Aut}(E/E^H)$. Then we have trivially $H\subset\overline H$, and (a) and (b) imply $|H|=|\overline H|$.
Proof of (a). Let $1\le i\le n$. Put $K:=S(a_1,\dots,a_{i-1})$ and $L:=K(a_i)$. It suffices to check that any $F$-embedding $\phi$ of $K$ in $E$ has exactly $[L:K]$ extensions to an $F$-embedding $\Phi$ of $L$ in $E$; or, equivalently, that the polynomial $p\in\phi(K)[X]$ which is the image under $\phi$ of the minimal polynomial of $a_i$ over $K$ has $[L:K]$ distinct roots in $E$. But this is clear since $p$ divides the product of the $X-a_j$.
Proof of (b). In view of (a) it is enough to check $|H|\ge[E:E^H]$. Let $k$ be an integer larger than $|H|$, and pick a
$$
b=(b_1,\dots,b_k)\in E^k.
$$
We must show that the $b_i$ are linearly dependent over $E^H$, or equivalently that $b^\perp\cap(E^H)^k$ is nonzero, where $\bullet^\perp$ denotes the vectors orthogonal to $\bullet$ in $E^k$ with respect to the dot product on $E^k$. Any element of $b^\perp\cap (E^H)^k$ is necessarily orthogonal to $hb$ for any $h\in H$, so
$$
b^\perp\cap(E^H)^k=(Hb)^\perp\cap(E^H)^k,
$$
where $Hb$ is the $H$-orbit of $b$. We will show $(Hb)^\perp\cap(E^H)^k$ is nonzero. Since the span of $Hb$ in $E^k$ has $E$-dimension at most $|H| < k$, $(Hb)^\perp$ is nonzero. Choose a nonzero vector $x$ in $(Hb)^\perp$ such that $x_i=0$ for the largest number of $i$ as possible among all nonzero vectors in $(Hb)^\perp$. Some coordinate $x_j$ is nonzero in $E$, so by scaling we can assume $x_j=1$ for some $j$. Since the subspace $(Hb)^\perp$ in $E^k$ is stable under the action of $H$, for any $h$ in $H$ we have $hx\in(Hb)^\perp$, so $hx-x\in(Hb)^\perp$. Since $x_j=1$, the $j$-th coordinate of $hx-x$ is $0$, so $hx-x=0$ by the choice of $x$. Since this holds for all $h$ in $H$, $x$ is in $(E^H)^k$.
[A] Emil Artin, Galois Theory, Lectures Delivered at the University of Notre Dame, Chapter II, available here.
PDF version: http://www.iecl.univ-lorraine.fr/~Pierre-Yves.Gaillard/DIVERS/Selected_Texts/st.pdf
Here is the part of the answer that has not been rewritten:
Although I'm very interested in the history of Galois Theory, I know almost nothing about it. Here are a few things I believe. Thank you for correcting me if I'm wrong. My main source is
http://www-history.mcs.st-and.ac.uk/history/Projects/Brunk/Chapters/Ch3.html
Artin was the first mathematician to formulate Galois Theory in terms of a lattice anti-isomorphism.
The first publication of this formulation was van der Waerden's "Moderne Algebra", in 1930.
The first publications of this formulation by Artin himself were "Foundations of Galois Theory" (1938) and "Galois Theory" (1942).
Artin himself doesn't seem to have ever explicitly claimed this discovery.
Assuming all this is true, my (probably naive) question is:
Why does somebody who makes such a revolutionary discovery wait so many years before publishing it?
I also hope this is not completely unrelated to the question.
Best Answer
It is very difficult to find a paper of Hurwitz dealing in any way with the geometry or topology of $\mathbb{H}^3/PSL_2(\mathcal{O}_{\mathbb{Q}(\sqrt{-D})})$. (And I am not sure I like what this implies about our ways of keeping knowledge alive, attribution and such.) The closest two things I could find were the following:
L. Bianchi. Geometrische Darstellung der Gruppen linearer Substitutionen mit ganzen complexen Coefficienten nebst Anwendungen auf die Zahlentheorie. Math. Ann. 38 (1891), 313-333.
The main purpose of the investigation seems to have been reduction theory (and classification) of quadratic forms. The description of the fundamental domain is based on explicit generators for the group. The discussion of the topology is based on Poincaré's hyperbolic space model, the explicit transformation formulas for the group action, and the metric of hyperbolic space.
On page 2 of Bianchi's paper there is a reference to
A. Hurwitz: Über die Entwicklung complexer Grössen in Kettenbrüche. Acta Math. 11 (1888), 187-200.
That paper is mostly about continued fractions and discusses what could possibly be interpreted as related to the fundamental domains for $PSL_2(\mathcal{O}_{\mathbb{Q}(\sqrt{-D})})$ for $D=1$ and $D=3$. It seems, however, to be a rather indirect link of Hurwitz to cusps.
L. Bianchi. Sui gruppi di sostituzioni lineari con coefficienti appartenenti a corpi quadratici immaginarî. Math. Ann. 40 (1892), 332-412.
In this, he states the relevant theorem: "Il numero dei vertici singolari eguaglia il numero delle classi degli ideali nel corpo quadratico corrispondente." Singular vertices for him are orbits of boundary points (or boundary points in each fundamental domain), and it seems to me that in §§2 and 3 he essentially proves this result by computing the orbits of $PSL_2(\mathcal{O}_{\mathbb{Q}(\sqrt{-D})})$ on the boundary projective line (as already suggested in various comments).
There is a reference to Hurwitz's work here as well:
A. Hurwitz. Grundlagen einer independenten Theorie der Modulfunctionen. Math. Ann 18 (1881), pp. 528-592.
For some reason this paper does not seem to be in MathSciNet. Anyway, Bianchi says
"Ma senza riferirci al teorema generale di Poincaré daremo qui una dimostrazione diretta di questa proprietà affatto analoga a quella che il Sig. Hurwitz ha fatto conoscere pel gruppo modulare."
Meaning that Bianchi's computation of the fundamental domain does not make use of Poincaré's general theorem, but is proved directly analogous to Hurwitz's arguments for the modular group. Hurwitz's treatment of fundamental domains for the modular group and its congruence subgroups on the upper half plane (in the abovementioned paper) seems to consider only the upper half plane, and not discuss boundary points.
Conclusion: At least from these two findings, it seems that Hurwitz was not directly involved in the proof that cusps for the Bianchi groups are in bijection with ideal classes. Maybe additional information could be inferred from Hurwitz's mathematical diaries which are available from the ETH library.
It is not clear to me if Bianchi really did consider the topology of the orbifold. He considered the group $PSL_2(\mathcal{O}_{\mathbb{Q}(\sqrt{-D})})$ (or $PGL_2$) as generated by an explicit set of transformations, and he viewed the group action on $\mathbb{H}^3$ in terms of explicit formulas. From the papers above it seems he did not think about an orbit space, but just about the construction of some fundamental polyhedron which contained representatives for all orbits. From this point of view, of course, cusps and the quotient topology are not an issue at all since everything takes place in $\mathbb{H}^3$ or $\overline{\mathbb{H}^3}$. (The same applies to Hurwitz's treatment of the action of the modular group on the upper half plane.)
In any case, in view of the above references it seems to me more appropriate to credit Bianchi (in 1892) with the identification of cusps and ideal classes.