I do think Jerry Schirmer answered the question in the comments, but I'll try to expand just to make clear how he explained everything.
Let us consider given that special relativity is correctly described by physics in Minkowski spacetime. Then we can ask ourselves how to include gravity without violating causality, which is mandatory by the finite velocity of light.
The idea is to consider Einstein's elevator. Namely that there is no local experiment which can be done that can differentiate between bodies in free fall in a constant gravitational field and the same bodies uniformly accelerated. That's because gravity affects everything the same way. A somewhat formalization of this is called Einstein's equivalence principle (in contrast with Galileo's, that say about coordinate transformation by constant velocities).
Note first that this is not the case for eletromagnetism. One can always use test charges to determine the electromagnetic fields, and it is impossible to do away with them using accelerated frames. Also, the equivalence principles is strictly local. If you look at extend regions gravity will appear through tidal forces.
So, if you think that special relativity is a particular case of general relativity (because it's just the same without gravity) the question is: what looks locally like special relativity but not globally? The answer is curved lorentzian manifolds, that locally are Minkowski.
But, as Jerry stressed, if you think in curved manifolds as generalization of flat ones, that does not, in principle, say anything about gravity. Only by noticing it is a force unlike any other, and formalizing it through the equivalence principle, one can justify the physics behind it, that is the use of curved manifolds. For instance, you suggest it is natural to generalize the situation by allowing curved spaces, but from the mathematical point of view one could just as well argue that there are other forms of generalization, e.g. we could instead try to projectify Minkowski. This is indeed usefull in other contexts, but it has nothing to do with gravity. So for a physicist is important we have "conceptual insights" to guide the process of "generalization for comprehension", or in other words we need principles with physical content.
I'm really unsure about what Gauss could be thinking regarding the metric. He did try to formulate classical mechanics in a differential geometrical way (Lanczos "Vartiational principles of classical mechanics" discusses it), but if that's what you're referring to, then it had nothing to do specifically with gravity.
EDIT: Oh boy, that last sentence is very misleading, I'm sorry. I had a look at Lanczos' book and realized that while Gauss pushed for a different formulation of classical mechanics, it's called Principle of Least Constraint, page 106 in Lanczos, it was only after some time that Hertz gave the principle the geometrical interpretation. So really not relevant to you question. I won't erase the paragraph though, in case anyone is interested.
Also, the equivalence principle argument says nothing about the field equations, and would be true even if the correct equations were different. As a matter of fact, a lot of general relativity independs of Einstein Field Equations, like the causal structure and (to some extend) the singularity theorems. This is why the equivalence principle was formulated as early as 1907 but the field equations came only in 1915.
I'm not a big fan of "what if" questions in history, majorly because they don't seem to have answers, but while Poincaré had the Lorentz trasnformations and a lot of understanding of special relativity, I never heard of anyone who anticipated the equivalence principle. So I hope this makes plausible that while others could have done SR, it did not seem likely that GR was coming, because first it was needed to understand what gravity is. Nordstrom's theory is an extension of ideas of eletromagnetism and was bound to failure. Hilbert indeed got the field equations right on his own, but would not get there without the motivation of curved spacetimes
Best Answer
"The Hamiltonian is zero" is not really an interesting statement for reparametrization-invariant theories - the Hamiltonian is generically zero for such theories, see this answer of mine.
The crucial point is that a Hamiltonian theory is more than its "naive" Hamiltonian $H(p,q)$. The Hamiltonian theories that correspond to Lagrangian theories with gauge freedoms are typically constrained (see also this answer of mine), and the action of such a constrained theory looks like $$ S = \int (\dot{q}^ip_i - u^\alpha\chi_\alpha - H)\mathrm{d}t$$ where the $\chi_\alpha$ are the constraints that must be fulfilled as $\chi_\alpha(p(t),q(t)) = 0$ on-shell classically and as $\chi_\alpha \lvert \psi(t)\rangle = 0$ quantumly. Your argument shows $H=0$, but that's just a very elaborate way of showing the general fact that reprametrization-invariant systems have zero Hamiltonians. It does not remove at all the requirement that the constraints of the system need to be implemented. The confusion with the "Hamiltonian constraint" of the Wheeler-deWitt equation is that the people who talk about the Wheeler-deWitt equation consider as "the Hamiltonian" the extended Hamiltonian $$ H' = u^\alpha \chi_\alpha - H$$ so that $H'\lvert \psi\rangle = 0$ requires fulfillment of the constraints.