The standard way to visualize $\epsilon_0$ is by the Hydra game. Here the elements of $\epsilon_0$ are visualized as isomorphism classes of rooted finite trees. The inequality can be described by the "cutting off heads" rule: The tree $T_1$ is greater than $T_2$ is there is a series of head cuttings which reduces $T_1$ to $T_2$. Writing out the inequality relationship between trees directly is a pain, see my blogpost. If you turn those nested sets into trees in the obvious way, you get the Hydra game.
I am told that most people do not find it intuitive that the Hydra game ends. I find that, once I've played a few rounds (try this applet) I find it "obvious", although writing down an actual proof is still painful.
As far as an actual proof, you should directly show the following: Let $X$ be a totally ordered set. Let $\omega^X$ be the set of functions $X \to \omega$ which are $0$ for almost all $x \in X$, ordered as follows: Let $f$ and $g$ be distinct elements of $\omega^X$ and let $x$ be the greatest element of $x$ for which $f(x)\neq g(x)$. Then $f<g$ if and only if $f(x) < g(x)$. Then $\omega^X$ is well ordered.
So every tower of $\omega$'s is well ordered and, $\epsilon_0$, being the union of all such towers, is also well-ordered.
By the way, you don't ask this, but you might be curious what happens when you try to write out this proof within PA. Recall that PA can't directly talk about subsets of $\omega$. The statement that $\omega$ is well-ordered is encoded as an axiom schema. Let $\phi(x, y_1, y_2, \ldots, y_N)$ be any statement with variables $x$ and $y_i$ running through $\omega$. Then PA has the following axiom:
$$\forall y_1, y_2, \ldots, y_n \in \omega: \left( \exists x \in \omega : \phi(x, y_{\bullet}) \implies \exists x' \in \omega : \left( \phi(x', y_{\bullet}) \wedge \forall x \in \omega \left( \phi(x, y_{\bullet}) \implies x \geq x' \right) \right) \right).$$
Please read this axiom until you understand that, in English, it says "For all $y$'s, if there is some $x$ obeying $\phi$, then there is a least $x$ obeying $\phi$."
Let's call this axiom $W(\phi, \omega)$. We'll use similar notation with $\omega$ replaced by other sets. Here is a challenging and important exercise: Let $X$ be an ordered set. Let $\phi$ be a statement about $\omega^X$, which may have other variables $y_i$ in it. Construct a specific statement $\sigma(\phi)$ about $X$, with other variables $z_i$ running through $X$, such that
$$ W(\sigma(\phi), X) \implies W(\phi, \omega^X) \quad (*).$$
For every specific $\phi$, the statement $(*)$ can be proved in PA. Since $W(\psi, \omega)$ is a axiom of PA for every $\psi$, we can prove $W(\phi, \omega^{\omega^{\ldots^{\omega}}})$ in PA for any $\phi$ and any specific height of tower.
But, in order to show that $\epsilon_0$ is well-ordered, we need to show that $W(\phi, \omega^{\omega^{\ldots^{\omega}}})$ simultaneously for every height of tower. Tracing through the arguments here, you would need to know $W(\phi, \omega)$, $W(\sigma(\phi), \omega)$, $W(\sigma(\sigma(\phi)), \omega)$, $W(\sigma^3(\phi), \omega)$ and so forth. As a human mathematician, that probably doesn't bother you at all. But, in the formal system PA, any proof can only use finitely many axioms. So there is no way to write a proof which uses all of the axioms $W(\sigma^k(\phi), X)$, for all $k$.
Of course, this doesn't show that some more clever argument couldn't prove that $\epsilon_0$ is well-ordered while working with PA; you need the Kirby-Paris theorem for that. (More precisely Kirby-Paris plus Godel shows that, if PA proves $\epsilon_0$ is well-ordered, then PA is inconsistent.) But I find that seeing this obstacle, the need to use infinitely many versions of the well-ordering axiom, clarifies my understanding of what is gong on.
Theorem. The following are equivalent.
The relation on $\mathbb{N}$ computed by your program P is a well-order.
ZF is $\Pi^1_1$-sound.
Proof. You gave the argument for the reverse implication $(2\to 1)$, since you only needed to know that that whenever ZF proves that a certain computable relation is a well-order, that it really is, and these are instances of $\Pi^1_1$-soundness. Conversely, suppose that P computes a well-order. Thus, whenever ZF proves that a specific TM computes a well-order, then it really does compute a well order, for otherwise the ill-foundedness would show up in the relation computed by your program P. I claim that this implies $\Pi^1_1$-soundness. To see this, suppose that ZF proves a $\Pi^1_1$ assertion $\sigma$. It is a basic fact of descriptive set theory that every $\Pi^1_1$-assertion is equivalent to the well-foundedness of a certain explicit computable tree, which via the Kleene-Brouwer order means that $\sigma$ is equivalent to the assertion that a certain computable order, for which an explicit TM encoding can be extracted from the syntactic representation of $\sigma$, is a well-order. Since ZF proved $\sigma$, it follows that ZF proves that this computable order is a well-order. By (1), we've assumed that ZF is correct about that, and so the order really is a well order. And so $\sigma$ really is true. Thus, ZF is $\Pi^1_1$-sound. QED
Best Answer
So, we know from reverse mathematics that nearly all "bread-and-butter" theorems are, suitably encoded, provable in proof-theoretically weak subsystems of second order arithmetic. If a theorem is provable in a system whose proof theoretic ordinal is $\alpha$, then in some sense ordinals larger than $\alpha$ need not come into its proof.
Goodstein's theorem, mentioned in the comments, cannot be proven in PA, so in some way the ordinals up to $\epsilon_0$ are "needed" in its proof. But induction up to $\epsilon_0$ can be expressed in a very non-ordinal way: the consistency of PA + all true $\Pi_1$ sentences implies Goodstein's theorem, so I am not sure there is any satisfying way to formulate the claim that ordinals are "needed" in its proof, in Gowers's sense of "needed" (i.e., we have to teach someone about ordinals before they have any chance of understanding any proof of the theorem).
It sure seems like you need to use ordinals to prove the Cantor-Bendixson theorem (every closed set of reals is the union of a countable set and a perfect set), and indeed the proof-theoretic ordinal needed for reverse mathing it is relatively high, namely $\Gamma_0$. [I take this back! An ordinal-free proof is given in William's answer here.] The graph minor theorem "needs" even larger ordinals, in the sense that it cannot be proven in systems whose proof-theoretic ordinal is less then (I believe) the small Veblen ordinal.