The problem is that you also do a "meta" induction when proving choice for a finite number of sets and this "meta" induction fails in the transfinite or countable case.
To be more clear, the proof of choice for a number of sets sets usually goes something like this (I'm assuming choice is "the product of nonempty sets is nonempty").
Base case is vacuously true (you're given one nonempty set and want to prove it's nonempty).
Next, assume inductively that for any nonempty sets $X_1,...,X_n$, the product $X_1\times...\times X_n$ is nonempty. Now, given $n+1$ sets, $X_1,...,X_{n+1}$, you want to show their product is nonempty. By the induction hypothesis, there is an element $y\in X_1\times....\times X_n$. Also, there is an element $x_{n+1}\in X_{n+1}$ by assumption. Then $\{\{y,x_{n+1}\}, x_{n+1}\} = (y, x_{n+1})$ is an element of the product.
On the meta level, what you're saying by unraveling the induction is "I have $n+1$ sets $X_1,...,X_{n+1}$, and I know there is an element (which is itself a set because we're doing set theory!) in each $X_i$ which I'll call $x_i$. By using the axiom of pairing once, I can form the set $\{x_1,x_2\}$. By using it again, I can form the set $\{\{x_1,x_2\}, x_1\}$, which is the definition of $(x_1,x_2)$. Using the axiom of pairing two more times, I can form $(x_1, x_2, x_3)$. In general (i.e., by induction), by using the axiom of pairing $2(n+1)$ times, I can form the set $(x_1,x_2,...,x_{n+1})$. This set (element) is a member of $X_1\times...\times X_{n+1}$, and hence this product is nonempty!"
What goes wrong as soon as you try to adapt this to the countable or larger setting is on the meta level. Essentially, you'd have to apply the pairing axiom countably (or more) times to create the set $(x_1,...)$ which "should" be in $X_1\times...\times $. But naive set theory has taught us that just because we think something "should" be a set, doesn't mean it should (see, for example, Russel's Paradox). Thus, we at least cannot trust this proof to work in countable or larger settings. Of course, the failure of this approach doesn't mean there is NO way of proving choice from $ZF$, but it's known from other methods that you cannot prove choice from $ZF$ alone.
The ordinals are what you get when you iterate the operations of 'successor' and 'supremum' indefinitely, much like the natural numbers are what you get when you iterate the sole operator 'successor' indefinitely.
- Start with $0$. Iterating successors we get the natural numbers, which are the finite ordinals:
$$0, 1, 2, 3, \dots $$
- Now take the supremum. We call this ordinal $\omega$. Iterating successors we get a longer sequence:
$$0, 1, 2, \dots, \omega, \omega+1, \omega+2, \dots$$
- The supremum of this sequence is the ordinal $\omega + \omega$. We can then take even more successors:
$$0, 1, \dots, \omega, \omega+1, \dots, \omega + \omega, \omega + \omega + 1, \dots$$
The supremum of this sequence is the ordinal $\omega + \omega + \omega$.
Continuing in this way gives rise to the following ordinals:
$$\omega,\ \omega+\omega,\ \omega+\omega+\omega,\ \omega + \omega + \omega + \omega, \dots$$
So we can take another supremum to obtain the ordinal $\omega \cdot \omega$. Likewise we obtain $\omega \cdot \omega \cdot \omega$, and so on, the supremum of all of which is $\omega^{\omega}$. Then we obtain $\omega^{\omega^{\omega}}$ and so on, the supremum of all of which is called $\varepsilon_0$... and so on.
Continuing even further rise to the countable ordinals. But that itself is a set of ordinals, so it has a supremum, called $\omega_1$. Then we can take its successors $\omega_1+1$ and so on.
The ordinals are precisely the things which can be obtained by iterating the successor operation and taking suprema of sets of ordinals.
More formally, the (von Neumann) ordinals are the elements of the class $\mathrm{Ord}$, which is the closure of $\varnothing$ under the successor operation $x \mapsto x \cup \{ x \}$ and under taking arbitrary unions.
The principle of transfinite induction essentially says that, for a given formula $P(x)$, if $P(0)$ is true, and the truth of $P(\alpha)$ is preserved by taking successors and suprema, then $P(\alpha)$ must be true of all ordinals $\alpha$. (We can omit the $P(0)$ case because $0 = \sup (\varnothing$).)
Best Answer
Gentzen's proof can be formalized in PRA plus "transfinite induction on $\epsilon_0$", provided the latter is phrased in the right way. It does not require set theory of any sort, much less the axiom of choice. As mentioned by spaceisdarkgreen in the comments, this is one reason (among many) that Gentzen's proof is of interest.
However, as a separate point, "PA is consistent" is a $\Pi^0_1$ sentence, and a sentence of that form provable in ZFC is always provable in ZF as well, so even if we look at ZFC there is no way that proving the consistency of PA could require the axiom of choice. Even if choice were used to simplify a ZFC proof that PA is consistent, the use of choice could be eliminated.