I do think Jerry Schirmer answered the question in the comments, but I'll try to expand just to make clear how he explained everything.
Let us consider given that special relativity is correctly described by physics in Minkowski spacetime. Then we can ask ourselves how to include gravity without violating causality, which is mandatory by the finite velocity of light.
The idea is to consider Einstein's elevator. Namely that there is no local experiment which can be done that can differentiate between bodies in free fall in a constant gravitational field and the same bodies uniformly accelerated. That's because gravity affects everything the same way. A somewhat formalization of this is called Einstein's equivalence principle (in contrast with Galileo's, that say about coordinate transformation by constant velocities).
Note first that this is not the case for eletromagnetism. One can always use test charges to determine the electromagnetic fields, and it is impossible to do away with them using accelerated frames. Also, the equivalence principles is strictly local. If you look at extend regions gravity will appear through tidal forces.
So, if you think that special relativity is a particular case of general relativity (because it's just the same without gravity) the question is: what looks locally like special relativity but not globally? The answer is curved lorentzian manifolds, that locally are Minkowski.
But, as Jerry stressed, if you think in curved manifolds as generalization of flat ones, that does not, in principle, say anything about gravity. Only by noticing it is a force unlike any other, and formalizing it through the equivalence principle, one can justify the physics behind it, that is the use of curved manifolds. For instance, you suggest it is natural to generalize the situation by allowing curved spaces, but from the mathematical point of view one could just as well argue that there are other forms of generalization, e.g. we could instead try to projectify Minkowski. This is indeed usefull in other contexts, but it has nothing to do with gravity. So for a physicist is important we have "conceptual insights" to guide the process of "generalization for comprehension", or in other words we need principles with physical content.
I'm really unsure about what Gauss could be thinking regarding the metric. He did try to formulate classical mechanics in a differential geometrical way (Lanczos "Vartiational principles of classical mechanics" discusses it), but if that's what you're referring to, then it had nothing to do specifically with gravity.
EDIT: Oh boy, that last sentence is very misleading, I'm sorry. I had a look at Lanczos' book and realized that while Gauss pushed for a different formulation of classical mechanics, it's called Principle of Least Constraint, page 106 in Lanczos, it was only after some time that Hertz gave the principle the geometrical interpretation. So really not relevant to you question. I won't erase the paragraph though, in case anyone is interested.
Also, the equivalence principle argument says nothing about the field equations, and would be true even if the correct equations were different. As a matter of fact, a lot of general relativity independs of Einstein Field Equations, like the causal structure and (to some extend) the singularity theorems. This is why the equivalence principle was formulated as early as 1907 but the field equations came only in 1915.
I'm not a big fan of "what if" questions in history, majorly because they don't seem to have answers, but while Poincaré had the Lorentz trasnformations and a lot of understanding of special relativity, I never heard of anyone who anticipated the equivalence principle. So I hope this makes plausible that while others could have done SR, it did not seem likely that GR was coming, because first it was needed to understand what gravity is. Nordstrom's theory is an extension of ideas of eletromagnetism and was bound to failure. Hilbert indeed got the field equations right on his own, but would not get there without the motivation of curved spacetimes
On point i.): JohhnyMo1's comment touches the essential point, though the result he quoted holds assuming that the manifold is Hausdorff. I emphasize this because the definition of paracompactness in the literature is not uniform - sometimes it is assumed that a paracompact topological space is Hausdorff, sometimes not (M. W. Hirsch's book "Differential Topology", for instance, doesn't - neither does he assume that a manifold must be Hausdorff, by the way). More precisely, the result is a direct consequence of the Smirnov metrization theorem: a topological space is metrizable if and only if it is Hausdorff, paracompact and locally metrizable (i.e. any point has an open neighborhood whose relative topology is metrizable). Any manifold clearly satisfies the latter condition.
(EDIT: I've just got acquainted with the Smirnov metrization theorem, which allows one to do away with the connectedness hypothesis. Moreover, the counterexample I previously wrote is incorrect)
One should also add that paracompactness is equivalent to the existence of partitions of unity, which allow us to glue together locally defined objects in the manifold - for instance, this is how you prove existence of Riemannian metrics.
On point ii): if by "compact" you mean "compact without boundary" (like $S^n$), compact space-times indeed necessarily have vanishing Euler characteristic - conversely, any compact manifold with vanishing Euler characteristic admits a time oriented Lorentzian metric. However, such space-times are not physically interesting because they necessarily have closed timelike curves. The argument is simple: since any space-time $(\mathscr{M},g)$ may be covered by the chronological futures of all its points (which are open sets), using compactness one can pass to a finite subcover, say $\mathscr{M}=I^+(p_1)\cup\cdots\cup I^+(p_n)$. Therefore, $p_1$ must belong to $I^+(p_{j_1})$ for some $j_1=1,\ldots,n$, $p_{j_1}$ must belong to $I^+(p_{j_2})$ for some $j_2=1,\ldots,n$, and so on. Since we are dealing with a finite number of points, eventually one must have $p_{j_k}=p_1$ for some $k$ between $1$ and $n$, thus producing a closed timelike curve. Since such space-times are not globally hyperbolic, they are also unsuitable for the analysis of hyperbolic (i.e. wave-like) PDE's. Noncompact (Hausdorff, connected and paracompact, as in point (i)) manifolds, on the other hand, always admit a time oriented Lorentzian metric.
A reference that discusses which topological hypotheses on space-time manifolds are natural is the classic book by S. W. Hawking and G. F. R. Ellis, "The Large Scale Structure of Space-Time" (Cambridge University Press, 1973).
Best Answer
As asked in the comments, here is one answer :
One formalism where it is somewhat common to expand the Einstein equations into a full set of equations is the Newman-Penrose formalism. Not quite common as it uses both spinors instead of tensors and the coordinates are weird complex null-vectors, but it should give an idea of the whole thing.
https://en.wikipedia.org/wiki/Newman%E2%80%93Penrose_formalism#NP_field_equations