You seem to believe that it is somehow contradictory to have a set model of ZFC inside another model of ZFC. But this belief is mistaken.
As Gerald Edgar correctly points out, the Completeness Theorem of first order logic asserts that if a theory is consistent (i.e. proves no contradiction), then it has a countable model. To be sure, the proof of the Completeness Theorem is fairly constructive, for the model is built directly out of the syntactic objects (Henkin constants) in an expanded language. To re-iterate, since you have mentioned several times that you find something problematic with it:
- Completeness Theorem. (Goedel 1929) If a theory is consistent, then it has a model that is a set.
The converse is much easier, so in fact we know that a theory is consistent if and only if it has a set model. This is the answer to your question.
More generally, if a theory is consistent, then the upward Lowenheim-Skolem theorem shows that it has models of every larger cardinality, all of which are sets. In particular, since the language of set theory is countable, it follows that if ZFC is consistent, then it has models of any given (set) cardinality.
The heart of your confusion appears to be the mistaken belief that somehow there cannot be a model M of ZFC inside another structure V satisfying ZFC. Perhaps you believe that if M is a model of ZFC, then it must be closed under all the set-building operations that exist in V. For example, consider a set X inside M. For M to satisfy the Power Set axiom, perhaps you might think that M must have the full power set P(X). But this is not so. All that is required is that M have a set P, which contains as members all the subsets of X that exist in M. Thus, M's version of the power set of X may be much smaller than the power set of X as computed in V. In other words, the concept of being the power set of X is not absolute between M and V.
Different models of set theory can disagree about the power set of a set they have in common, and about many other things, such as whether a given set is countable, whether the Continuum Hypothesis holds, and so on. Some of the most interesting arguments in set theory work by analyzing and taking advantage of such absoluteness and non-absoluteness phenomenon.
Perhaps your confusion about models-in-models arises from the belief that if there is a model of ZFC inside another model of ZFC, then this would somehow mean that we've proved that ZFC is consistent. But this also is not right. Perhaps some models of ZFC have models of ZFC inside them, and others think that there is no model of ZFC. If ZFC is consistent, then these latter type of models definitely exist.
- Incompleteness Theorem. (Goedel 1931) If a (representable) theory T is consistent (and sufficiently strong to interpret basic arithmetic), then T does not prove the assertion "T is consistent". Thus, there is a model of T which believes T is inconsistent.
In particular, if ZFC is consistent, then there will be models M of ZFC that believe that there is no model of ZFC. In the case that ZFC+Con(ZFC) is consistent, then some models of ZFC will have models of ZFC inside them, and some will believe that there are no such models.
A final subtle point, which I hesitate to mention because it can be confusing even to experts, is that it is a theorem that every model M of ZFC has an object m inside it which M believes to be a first order structure in the language of set theory, and if we look at m from outside M, and view it as a structure of its own, then m is a model of full ZFC. This was recently observed by Brice Halmi, but related observations are quite classical. The proof is that if M is an ω-model, then it will have the same ZFC as we do in the meta-theory and the same proofs, and so it will think ZFC is consistent (since we do), and so it will have a model. If M is not an ω-model, then we may look at the fragments of the (nonstandard) ZFC inside M that are true in some Vα of M. Every standard fragment is true in some such set in M by the Reflection Theorem. Thus, by overspill (since M cannot see the standard cut of its ω) there must be some Vα in M that satisfies a nonstandard fragment of its ZFC. Such a model m = VαM will therefore satisfy all of the standard ZFC. But M may not look upon it as a model of ZFC, since M has nonstandard axioms which it thinks may fail in m.
Your proof strategy via (1) and (2) is impossible. If $PA\cup\Sigma$ proves that Goodstein's theorem is false, then the proof will have finite length, and so there will be some finite $\Sigma_0\subset\Sigma$ such that $PA\cup\Sigma_0$ proves that Goodstein's theorem is false. This would imply by (1) that Goodstein's theorem is false in the standard model. But Goodstein's theorem holds in the standard model, as Goodstein proved.
A second point is that you may find that there are no specific "natural" models of PA at all other than the standard model. For example, Tennenbaum proved that there are no computable nonstandard models of PA; that is, one cannot exhibit a nonstandard model of PA so explicitly that the addition and multiplication of the model are computable functions. (See this related MO question.) But I do not rule out that there could be natural nonstandard models in other senses.
Best Answer
I'm sure the origin of the term is quite complex. Indeed, as Henry and David have pointed out, it is a very natural choice in this context.
I think the first definition of 'model' (not the first use) in the exact sense currently used in model theory is due to Tarski in O pojȩciu wynikania logicznego (On the concept of following logically; English translation MR1951812; German translation from Polish by Tarski himself). This is the paper where the current definition of logical consequence first appears:
I don't know much about Tarski's choice of terms here, but the commentary to the English translation by M. Stroinska and D. Hitchcock could be enlightening.
It is interesting to note that Tarski published his paper in 1936, half a decade after Gödel's Die Vollständigkeit der Axiome des logischen Funktionenkalküls (The completeness of the axioms of the functional calculus of logic; MR1549799). So it appears that these ideas were already known to members of the Vienna Circle and affiliates.