I think an important answer is still not present so I am going to type it. This is somewhat standard knowledge in the field of foundations but is not always adequately described in lower level texts.
When we formalize the syntax of formal systems, we often talk about the set of formulas. But this is just a way of speaking; there is no ontological commitment to "sets" as in ZFC. What is really going on is an "inductive definition". To understand this you have to temporarily forget about ZFC and just think about strings that are written on paper.
The inductive definition of a "propositional formula" might say that the set of formulas is the smallest class of strings such that:
Every variable letter is a formula (presumably we have already defined a set of variable letters).
If $A$ is a formula, so is $\lnot (A)$. Note: this is a string with 3 more symbols than $A$.
If $A$ and $B$ are formulas, so is $(A \land B)$. Note this adds 3 more symbols to the ones in $A$ and $B$.
This definition can certainly be read as a definition in ZFC. But it can also be read in a different way. The definition can be used to generate a completely effective procedure that a human can carry out to tell whether an arbitrary string is a formula (a proof along these lines, which constructs a parsing procedure and proves its validity, is in Enderton's logic textbook).
In this way, we can understand inductive definitions in a completely effective way without any recourse to set theory. When someone says "Let $A$ be a formula" they mean to consider the situation in which I have in front of me a string written on a piece of paper, which my parsing algorithm says is a correct formula. I can perform that algorithm without any knowledge of "sets" or ZFC.
Another important example is "formal proofs". Again, I can treat these simply as strings to be manipulated, and I have a parsing algorithm that can tell whether a given string is a formal proof. The various syntactic metatheorems of first-order logic are also effective. For example the deduction theorem gives a direct algorithm to convert one sort of proof into another sort of proof. The algorithmic nature of these metatheorems is not always emphasized in lower-level texts - but for example it is very important in contexts like automated theorem proving.
So if you examine a logic textbook, you will see that all the syntactic aspects of basic first order logic are given by inductive definitions, and the algorithms given to manipulate them are completely effective. Authors usually do not dwell on this, both because it is completely standard and because they do not want to overwhelm the reader at first. So the convention is to write definitions "as if" they are definitions in set theory, and allow the readers who know what's going on to read the definitions as formal inductive definitions instead. When read as inductive definitions, these definitions would make sense even to the fringe of mathematicians who don't think that any infinite sets exist but who are willing to study algorithms that manipulate individual finite strings.
Here are two more examples of the syntactic algorithms implicit in certain theorems:
Gödel's incompleteness theorem actually gives an effective algorithm that can convert any PA-proof of Con(PA) into a PA-proof of $0=1$. So, under the assumption there is no proof of the latter kind, there is no proof of the former kind.
The method of forcing in ZFC actually gives an effective algorithm that can turn any proof of $0=1$ from the assumptions of ZFC and the continuum hypothesis into a proof of $0=1$ from ZFC alone. Again, this gives a relative consistency result.
Results like the previous two bullets are often called "finitary relative consistency proofs". Here "finitary" should be read to mean "providing an effective algorithm to manipulate strings of symbols".
This viewpoint helps explain where weak theories of arithmetic such as PRA enter into the study of foundations. Suppose we want to ask "what axioms are required to prove that the algorithms we have constructed will do what they are supposed to do?". It turns out that very weak theories of arithmetic are able to prove that these symbolic manipulations work correctly. PRA is a particular theory of arithmetic that is on one hand very weak (from the point of view of stronger theories like PA or ZFC) but at the same time is able to prove that (formalized versions of) the syntactic algorithms work correctly, and which is often used for this purpose.
No, the induction axiom does not construct the natural numbers, but in order for the induction axiom to hold (in addition to the other Peano Axioms), your model does need to be isomorph to the natural numbers. Indeed, it is almost the other way around: the induction axiom does not generate or construct or force there to be infinitely many ordered elements in any model, as this is what the other axioms already do, but rather the induction axiom makes sure that in any model there aren't any further elements. That is, the induction axiom restricts the model to be 'nothing more' than the natural numbers.
To give at least some intuitive idea about this: Any model of the Peano axioms without the induction axiom is already required to contain an infinite number of elements that exist somewhere in an infinite chain of successors starting with $0$, i.e. something like the natural numbers. To see that, note that we need a '$0$' object (axiom 1) and that this $0$ object needs a successor (Axiom 6), and that this successor of $0$ cannot be $0$ itself (Axiom 8). So, we need a second object. But this second object needs a successor (Axiom 6), which cannot be $0$ (Axiom 8), but also cannot be itself, for we cannot have two different objects with the same successor (Axiom 7). So, we need a third object ... and this process keeps repeating itself: every time we add the new object that we are forced to add as the successor of the last object we added before that, that new object needs a successor itself, and that successor cannot be $0$ (Axiom 8), nor can it be any of the other already existing objects, for then we would get two different objects with the same successor, which goes against Axiom 7. So, you could say that the natural numbers (or at least an infinite sequence of successive objects) is already 'constructed' by these 4 axioms.
OK, so where does the axiom of induction come in? Well, while the other axioms require for any model to contain an infinite sequence of successive element, these other axioms do not rule out any additional elements in the model: elements that do not occur in this infinite sequence of successors. Indeed, without the axiom of induction you could have as a model $\mathbb{N} \cup \{ apples, bananas \}$, ... as long as you define how the interpretation of the successor, addition and multiplication function symbols apply to $apples$ and $bananas$ ... but that can all be done without problem.
However, the induction axiom says that 'if $0$ has some property $P$, and if $s(n)$ has property $P$ whenever $n$ has property $P$, then all objects have property $P$. Well, if you have that $0$ has some property $P$, and that $s(n)$ has property $P$ whenever $n$ has property $P$, then obviously all the objects that exist somewhere in the infinite chain of successors have property $P$, but how can you say that all the additional objects also have to have property $P$? You can't of course. So, the fact that the induction axiom says that we can conclude that all objects have property $P$ effectively rules out any of those additional objects in the domain, and hence your model needs to be isomorph to the natural numbers.
Best Answer
I am agnostic about most questions of the philosophy of mathematics, but in this case, I think Nelson's argument is incoherent: a primitive shepherd who doesn't have any abstract concept of number can count sheep being herded into a fold by making marks on a stick and count the sheep out again by crossing the marks off (and this is how our human notions of cardinal numbers probably developed). There is no circularity involved.
To talk meaningfully about notations like $2 \uparrow\uparrow\uparrow 6$, you must have admitted a notion of definition by recursion that (as an ultrafinitist) Nelson can't accept. It seems to be me to be incoherent for an ultrafinitist to say anything more than "I can't accept that a primitive shepherd could ever have enough sticks".