As I pointed out in the comments, the theory of free monoids is somewhat ill defined. It is still unclear what your logic is and what you really want, but you have two basic options which were proposed by Pete Clark and sigfpe. Here are a few additional remarks that may help you sort things out.
Universal Property à la Pete Clark. The free monoid functor $F:Set\to Mon$ is left-adjoint to the forgetful functor $G:Mon \to Set$. This adjunction completely determines the isomorphism classes of free monoids. Moreover, many of the essential properties of free monoids follow directly from properties of sets and adjoint functors.
Alternately, the bijection between $Set(A,GB)$ and $Mon(FA,B)$ can be described as Pete Clark suggested, without any reference to adjoint functors. This latter option preferable if your logic can handle the categories $Set$ and $Mon$, but not functors between them.
The obvious advantage of this approach is that this is the true freeness property of free monoids (the fact that word monoids are free is nothing but an interesting accident from this point of view). Other than the fact that $F1$ is a natural number object, there is no mention of natural numbers here. However, I always thought that there was a lot of unnecessary tedium to show that free monoids can be viewed as word monoids in this context.
Induction Axiom à la sigfpe. The idea here is to attempt to capture freeness via induction. Here is a variant that fixes the minor problem I pointed out in a comment to sigfpe's answer -- free monoids are structures $(F,A,e,{\cdot})$ where $(F,e,{\cdot})$ is a monoid and $A \subseteq F$ (understood as a unary predicate) is a distinguished set of generators. The axioms to describe this setup are the usual monoid axioms together with the obvious modifications of sigfpe's two axioms
- $A(a) \land A(a') \land x\cdot a = x' \cdot a' \to x = x' \land a = a'$,
- $\forall P\,(P(e) \land \forall x,a\,(P(x) \land A(a) \to P(x\cdot a)) \to \forall x\,P(x))$,
where $P$ varies over all unary predicates $P \subseteq F$. In first-order logic, quantification over such $P$ is not possible and the induction axiom must be replaced by the corresponding induction scheme where $P$ varies over all formulas of the language instead. (In higher-order logics without predicate types, you can usually replace $P(x)$ by $f(x) = g(x)$ for appropriate $f$ and $g$.)
In the standard semantics for second-order logic, these axioms completely describe free monoids. However you must be careful since not all formulations of higher-logic are equal. Much like first-order logic, some of these formulations will implicitly allow for models of the above theory which are not free in the categorical sense. In truth, this is unlikely to be a problem for you but I am compelled to warn you of this possibility.
Here's a low-tech way to look at it, which to me seems perfectly convincing.
Let C be some implementation of the reals via Cauchy sequences and D be some implementation of the reals via Dedekind cuts. Here C is "really" something like a tuple consisting of the set of reals, a relation corresponding to addition, etc.; D is a tuple with (allegedly) equivalent things implemented differently.
Let P(X) be the proposition that X is a tuple of the right size and that, when considered as an implementation of the real numbers, X satisfies the Birch-Swinnerton-Dyer conjecture. We have a proof -- perhaps a bizarre incomprehensible implementation-dependent one -- of P(C), in ZFC.
I claim that (again, in ZFC) P(C) iff P(D). Sketch of proof: 1. Up to canonical isomorphism there is only one complete ordered field. 2. C and D are complete ordered fields. 3. Therefore there is an isomorphism between C and D; in fact we can even write it down. 4. We can use this to build an isomorphism between C's complex numbers and D's complex numbers, and then between C's L-functions and D's L-functions, and C's elliptic curves and D's elliptic curves, and so on for every object required to state the BSD conjecture. 5. If we have a specific elliptic curve over D, these isomorphisms yield its equivalent over C (and vice versa); they pair up the groups of rational points in the two cases, showing that they have the same rank; they pair up the corresponding L-functions, showing that they have the same order of zero at s=1. 6. And we're done.
None of this requires that these isomorphisms be applied to the proof of P(C). That proof can be as C-specific as you like. What the isomorphisms show is that the things BSD says are equal come out the same way however you implement the real numbers.
How do we know that we can actually construct this pile of isomorphisms? By thinking about what objects we need in order to state the BSD conjecture, and how we build them, and noting that nothing in the process cares about "implementation details" of the real numbers. If you're sufficiently confident of your memory, you could do this "negatively" by noting that if when you were learning about elliptic curves and L-functions a lecturer had said something like "and of course this is true because the number 1 is just the same thing as the set containing just the empty set" you'd have noticed and been horrified. Otherwise, you can (tediously but straightforwardly) go through the usual textbooks and check that the constructions are all "sane".
EDITED to add:
Although I stand by everything above, I can't escape the feeling that Kevin already knows all that and I'm therefore not answering quite the question he's meaning to ask. Let me put some words into Kevin's mouth:
Yes, yes, of course. Every mathematician who thinks about this stuff at all has something like that mental picture. But what really justifies that breezy confidence that that big pile of isomorphisms is really there? I understand that it feels obvious that none of the machinery you need to state something like the BSD conjecture depends on "implementation details". But this is the sort of thing mathematicians are good at getting wrong. It wasn't until the 20th century that we noticed how many extra axioms you really need to add to Euclid's system to make the proofs in the Elements rigorous. The axiom of comprehension probably seemed obviously innocuous until Bertrand Russell asked whether the set of non-self-membered sets shaves itself. A more isomorphism-y example: it seems transparently obvious that a set $X$ is the same size as $\{\{x\}\,:\,x\in X\}$, but this fails if you work in NF instead of ZFC. Maybe there's some implementation detail no one ever noticed we were assuming. How can we be sure?
Again, personally I'm very confident that I'd have noticed if some implementation detail were being slipped into anything in "normal" mathematics (or at least, I'm as confident as that I'd have noticed any other sort of gap in the proofs -- I don't think there's anything special here), and very confident that if I missed one some of the many many other mathematicians, some of them much smarter than I am, who have read the same textbooks and been to the same lectures would have noticed. But I think Kevin's asking whether there's some simple principle that makes it obvious without any need either to trust that sort of thing, or to check in detail through everything in the textbooks, and I want to be clear that this answer doesn't purport to give one; my feeling is that there couldn't possibly be one, any more than there could be some simple principle that makes it obvious (with the same restrictions) that there are no other logical holes in those same textbooks, and for essentially the same reason.
Best Answer
It is not surprising that some versions of the Archimedean property concern subsets of the order rather than merely elements. The reason is that the Archimedean property is provably not expressible in a first order manner.
This is because the structure of the reals R, as an ordered field, say, (but one can add any structure at all), has elementary extensions to nonstandard models R* which are non-Archimedean. This means that any statement in the language of ordered fields that is true in the reals R will also be true in the nonstandard reals. To prove that such models exist is an elementary application of the Compactness theorem, and one can also construct them directly via the ultrapower construction. One can also control the cofinality of the nonstandard order. For example, one can arrange that every countable subset of R* is bounded. Since all these various nonstandard models R* satisfy exactly the same first order truths as the standard reals R, but are non-Archimedean, it follows that being Archimedean is not first-order expressible.
Being Archimedean is, of course, second-order expressible, and the usual definition is a second order definition. As Neel mentions in the comments, the natural numbers are identifiable as the smallest subset of the ordered field containing 0 and closed under successor n+1.
If one adds the natural numbers N as a predicate to the original model, so that one is looking at R as an ordered field with a unary predicate holding of the natural numbers, then the nonstandard model R* will include a nonstandard version N* of the natural numbers. This new field R*, which is not Archimedean, will nevertheless appear to be Archimedean relative to the nonstandard natural numbers N*. For example, for any x and y in R*, there will be a number n in N* such that nx > y.
Indeed, one can do amazing things along this line. Suppose that V is the entire set-theoretic universe, and let V* be a nonstandard version of it (such as an ultrapower by a nonprincipal ultrafilter on the natural numbers). Inside V*, the structure R* is thought to be the actual real numbers and so V* thinks R* is Archimedean, even though back in V we can see that it is mistaken, precisely because V* is using the wrong set of natural numbers for its conclusion. The model V* simply cannot see the true set of natural numbers sitting inside R*, because it does not have that set.
More generally, one can similarly describe what it should mean for any ordered field F to be Archimedean relative to a subring R. Perhaps this simple idea is the generalization for which you are looking? It is mainly amounting to the question of whether the subring is cofinal in the original order.
Thus, it is very natural to look at the possible cofinalities of the orders that arise in ordered fields (or the other types of structures that you consider). For any infinite regular cardinal κ, one may find an elementary extension of the reals R to a nonstandard ordered field R*, where the order of R* has a cofinal κ sequence. To do this, just perform a series of κ many extensions, each with new elements on top of the previous model. In κ many steps, the union of the structures you built will have an order with cofinality exactly κ.
If one only uses the ultrapower construction to construct the nonstandard models, however, then there are limits on the resulting cofinality of the order. Understanding these limits is a large part of Shelah's deep work on PCF (= possible cofinality) theory.