The standard way to visualize $\epsilon_0$ is by the Hydra game. Here the elements of $\epsilon_0$ are visualized as isomorphism classes of rooted finite trees. The inequality can be described by the "cutting off heads" rule: The tree $T_1$ is greater than $T_2$ is there is a series of head cuttings which reduces $T_1$ to $T_2$. Writing out the inequality relationship between trees directly is a pain, see my blogpost. If you turn those nested sets into trees in the obvious way, you get the Hydra game.
I am told that most people do not find it intuitive that the Hydra game ends. I find that, once I've played a few rounds (try this applet) I find it "obvious", although writing down an actual proof is still painful.
As far as an actual proof, you should directly show the following: Let $X$ be a totally ordered set. Let $\omega^X$ be the set of functions $X \to \omega$ which are $0$ for almost all $x \in X$, ordered as follows: Let $f$ and $g$ be distinct elements of $\omega^X$ and let $x$ be the greatest element of $x$ for which $f(x)\neq g(x)$. Then $f<g$ if and only if $f(x) < g(x)$. Then $\omega^X$ is well ordered.
So every tower of $\omega$'s is well ordered and, $\epsilon_0$, being the union of all such towers, is also well-ordered.
By the way, you don't ask this, but you might be curious what happens when you try to write out this proof within PA. Recall that PA can't directly talk about subsets of $\omega$. The statement that $\omega$ is well-ordered is encoded as an axiom schema. Let $\phi(x, y_1, y_2, \ldots, y_N)$ be any statement with variables $x$ and $y_i$ running through $\omega$. Then PA has the following axiom:
$$\forall y_1, y_2, \ldots, y_n \in \omega: \left( \exists x \in \omega : \phi(x, y_{\bullet}) \implies \exists x' \in \omega : \left( \phi(x', y_{\bullet}) \wedge \forall x \in \omega \left( \phi(x, y_{\bullet}) \implies x \geq x' \right) \right) \right).$$
Please read this axiom until you understand that, in English, it says "For all $y$'s, if there is some $x$ obeying $\phi$, then there is a least $x$ obeying $\phi$."
Let's call this axiom $W(\phi, \omega)$. We'll use similar notation with $\omega$ replaced by other sets. Here is a challenging and important exercise: Let $X$ be an ordered set. Let $\phi$ be a statement about $\omega^X$, which may have other variables $y_i$ in it. Construct a specific statement $\sigma(\phi)$ about $X$, with other variables $z_i$ running through $X$, such that
$$ W(\sigma(\phi), X) \implies W(\phi, \omega^X) \quad (*).$$
For every specific $\phi$, the statement $(*)$ can be proved in PA. Since $W(\psi, \omega)$ is a axiom of PA for every $\psi$, we can prove $W(\phi, \omega^{\omega^{\ldots^{\omega}}})$ in PA for any $\phi$ and any specific height of tower.
But, in order to show that $\epsilon_0$ is well-ordered, we need to show that $W(\phi, \omega^{\omega^{\ldots^{\omega}}})$ simultaneously for every height of tower. Tracing through the arguments here, you would need to know $W(\phi, \omega)$, $W(\sigma(\phi), \omega)$, $W(\sigma(\sigma(\phi)), \omega)$, $W(\sigma^3(\phi), \omega)$ and so forth. As a human mathematician, that probably doesn't bother you at all. But, in the formal system PA, any proof can only use finitely many axioms. So there is no way to write a proof which uses all of the axioms $W(\sigma^k(\phi), X)$, for all $k$.
Of course, this doesn't show that some more clever argument couldn't prove that $\epsilon_0$ is well-ordered while working with PA; you need the Kirby-Paris theorem for that. (More precisely Kirby-Paris plus Godel shows that, if PA proves $\epsilon_0$ is well-ordered, then PA is inconsistent.) But I find that seeing this obstacle, the need to use infinitely many versions of the well-ordering axiom, clarifies my understanding of what is gong on.
I asked the same question about the replacement axiom not long ago at the $n$-Category Café, and the answer I got back from Mike Shulman is that it's used for example in the transfinite construction of free algebras, which really refers to a body of connected results in category theory as described here. The essential use made of replacement is in the transfinite compositions; this also occurs in the small object argument.
Having said that, a part of me still wonders whether there aren't workarounds. In many cases an initial algebra of a functor is situated inside a terminal coalgebra of the same functor, and the construction of the latter often doesn't require transfinite compositions (this is the case, e.g., for polynomial endofunctors). Paul Taylor in his book Practical Foundations of Mathematics has a section on general recursion using a theory of well-founded coalgebras, which is manifestly meaningful in contexts where one does without replacement, such as ETCS, and I wonder to what extent this could be put to use to construct free algebras without resorting to replacement.
Best Answer
For the existence of an uncountable set with a definable well-ordering, it suffices to have $\mathcal P(\mathcal P(\mathbb N))$, where $\mathcal P$ means the power set. Any countable order-type is represented by a well-ordering of $\mathbb N$, and can therefore be coded by a subset of $\mathbb N$. Call two such codes equivalent if the well-orderings they code are isomorphic. Then the equivalence classes are elements of $\mathcal P(\mathcal P(\mathbb N))$, they correspond naturally to the countable ordinals, and so they constitute an uncountable well-ordered set. (If you want the well-ordering itself to be a set, rather than given by a definition, and if you want it to be literally a set of ordered pairs, with ordered pairs defined in the usual Kuratowski fashion, then you'll need some more assumptions to ensure the existence of the relevant ordered pairs, etc. But if you're willing to settle for some other coding of ordered pairs, then $\mathcal P(\mathcal P(\mathbb N))$ seems to suffice.)
In particular, the existence of an uncountable well-ordered set is well within the power of Zermelo set theory (like ZF but without the axiom of replacement). If, on the other hand, you want the set of countable ordinals and if you use von Neumann's (nowadays standard) representation of ordinals by sets, then Zermelo set theory is not enough. One natural model of Zermelo set theory is the collection of sets of rank smaller than $\omega+\omega$; this contains plenty of uncountable well-ordered sets, but its ordinals are just those below $\omega+\omega$. (The moral of this story is that, in Zermelo set theory and related systems, ordinals should not be defined using the von Neumann representation but rather as isomorphism classes of well-orderings.)