I think the main reason replacement is seen as an essential part of ZF is that it naturally follows from the ontology of set theory, as do the other axioms of ZF. The ontology of set theory is rooted in the idea that sets are obtained by an iterative process along a wellordered "ordinal clock", where at each step all the sets whose elements were generated earlier now appear. It is intuitively clear that, in order to be exhaustive, this process must go on for a long, long time. From this point of view, the replacement axiom can be intuitively stated: no set can be used as an indexing of a family of ordinals that reaches to the end of the ordinal timeline. This is a natural consequence of the idea that the iterative process should be exhaustive.
Another interesting aspect of replacement is that (in most formulations of ZFC) it is logically equivalent to reflection. (Informally: for any formula $\sigma(\vec{x})$ in the language of set theory, if $V \models \sigma(\vec{x})$ then there is an ordinal $\alpha$ such that $V_\alpha \models \sigma(\vec{x})$.) This is an extremely useful principle. One of its side effects is that ZFC is "self-justifying" in the sense that any finite fragment of ZFC is realized in a level $V_\alpha$ of the cumulative hierarchy. In other words, if one were to test set theory by examining a finite fragment of the axioms within the universe of sets, one would see that this finite fragment is not only consistent but that it has a model $V_\alpha$ that arises from the same iterative process that all sets do. In particular, ZFC comes very close to proving its own consistency even though we know this is not possible after Gödel. This feature makes ZFC very appealing as a foundational theory. (Note that PA has a similar self-justifying feature, but ETCS doesn't appear to have this.)
Another, more practical, use of replacement is to obtain "cheap universes". Grothendieck universes have proven useful for handling large objects. Unfortunately, one cannot prove their existence in ZFC. It is nevertheless often true that a "reasonable theorem" proven using Grothendieck universes is actually provable in ZFC. The reason is that the proof often doesn't make full use of all the features of Grothendieck universe, a finite fragment of those features often suffices and in such cases reflection provides a set $V_\alpha$ with all the necessary features to make the argument work. ETCS doesn't appear to have a good way of obtaining "cheap universes". This also hints at an alternative to replacement, which is to have a hierarchy of universes similar to those used in dependent type theory.
Operations on dependent families is where the need for replacement arises most. It's actually really hard to even talk about dependent families in the language of ETCS. The main issue with ETCS isn't necessarily that it can't prove the existence of coproducts like $\coprod_{n \in \mathbb{N}} \mathcal{P}^n(\mathbb{N})$, but that it has a hard time even talking about the family of all $\mathcal{P}^n(\mathbb{N})$ in the first place. Introducing universes would be an interesting way to get around that problem but there are other means, all of which are likely to make the need for replacement-like principles clear.
As for the proposed workaround, it's unclear you would get much more by this kind of process. Rather than ETCS, I'll work in BZC extended with terms for powersets and union and a constant symbol for $\omega$. The exponential-bounded formulas are defined like bounded formulas except that the bounding terms can involve powerset and union.
Fact. If $\phi(x,y)$ is an exponentially-bounded formula such that BZC proves that $\forall x \exists y \phi(x,y)$ then there is a standard number $n$ such that BZC proves that $\forall x \exists y(\phi(x,y) \land |y| \leq |\mathcal{P}^n(x \cup \omega)|)$.
Proof. Find a model $M$ of BZC and consider $x \in M$. Let $M' = \bigcup_n \{z \in M : |z| \leq |\mathcal{P}^n(x \cup \omega)|\}$ where $n$ ranges over the standard numbers only. Note that $M'$ is model of BZC that contains $x$. By hypothesis, there is a $y \in M'$ such that $M' \models \phi(x,y)$. Note that exponential-bounded formulas are absolute between $M'$ and $M$ since the only sets that we need to look at to figure out that $\phi(x,y)$ is true are in $M'$. Thus $M \models \phi(x,y)$ and also $M \models |y| \leq |\mathcal{P}^n(x \cup \omega)|$ by definition of $M'$. Now the fact that there is a fixed standard $n$ that provably works for all $x$ follows by a compactness argument. $\square$
The examples $\omega, \mathcal{P}(\omega), \mathcal{P}^2(\omega), \ldots$ and $V, V^{\ast}, V^{\ast\ast},\ldots$ are definable by a formula of the form $\exists z\phi(x,y,z)$ where $\phi(x,y,z)$ is exponential-bounded. Because of the nice biinterpretation between ETCS and BZC, for these and similar examples, either ETCS doesn't prove that the $n$-th iterate exists for every natural number $n$, or ETCS already proves that replacement holds in this particular instance.
Let me also address one aspect of the footnote, which states that replacement would be "the most likely culprit" if ZFC were found to be inconsistent. The "standard objection" to axiomatic set theory is actually with comprehension. If you think about it, comprehension is a rather bold statement: every formula in the language of set theory can be used to define a subset of a given set. The issue is that formulas can be complex beyond (human) understanding, it's hard to justify the use of comprehension for formulas we can't understand. In fact, it's not clear that comprehension is fully justified by the ontology of set theory described above. (Note that the same kind objection applies to PA, where one asks for induction to hold for arbitrary formulas.)
Set theory provides a foundation for mathematics in roughly the same way that Turing machines provide a foundation for computer science. A computer program written in Java or assembly language isn't actually a Turing machine, and there are lots of good reasons not to do real programming in Turing machines - real languages have all sorts of useful higher order concepts. But Turing machines are a useful foundation because everything else can be encoded by Turing machines, and because it's much easier to study Turing machines than it is to study a more complicated higher order language.
Similarly, the point isn't that every mathematical object is a set, the point is that every mathematical object can be encoded by a set. It doesn't represent higher level ideas, like the fact that mathematical objects usually have types (as one of my colleagues likes to point out, the question "is the integer 6 an abelian group" is technically a reasonable one in set theory, but not in mathematics). But it's a (relatively) simple system to study, and just about everything we want to do can be encoded in set theory.
To answer your specific questions, yes, it's still true that every mathematical object can be encoded as a set. Because sets are very flexible, there's no reason to think this will not continue to be true. There is no current field of mathematics in which urelements are essential, and because things one would do with urelements can instead be encoded with sets, there is unlikely to be such a field.
ZFC does impose some limitations on category theory, because it doesn't allow objects on the same scale of the universe of sets. (For instance the category of categories is awkward to consider within ZFC, because the objects of this category cannot be a set.) These are reflected in the discussions of "small" and "locally small" categories. These issues can be worked around in mild extensions of ZFC by using things like Grothendieck universes. (Note that this is a feature of ZFC, not of set theoretic foundations in general. Quine's New Foundations allows certain self-containing sets.)
This way of thinking can't really be burden because ZFC doesn't impose a way of thinking. The fact that things can be encoded as sets doesn't, and shouldn't, mean that we always think of them that way. It's perfectly consistent with having a set theoretic foundation to work with things like urelements, or to think about groups and categories without thinking of them as sets. (Worrying about things like self-containing categories can be a burden, but it's a necessary one given the history of paradoxical objects containing themselves.)
Best Answer
The answer is yes, in fact one has a lot better than bi-interpretability, as shown by the corollary at the end. It follows by mixing the comments by Martin Brandenburg and mine (and a few additional details I found on MO). The key observation is the following:
Theorem: The category of co-group objects in the category of groups is equivalent to the category of sets.
(According to the nLab, this is due to Kan, from the paper "On monoids and their dual" Bol. Soc. Mat. Mexicana (2) 3 (1958), pp. 52-61, MR0111035)
Co-groups are easily defined in purely categorical terms (see Edit 2 below).
The equivalence of the theorem is given by free groups as follows: if $X$ is a set and $F_X$ is the free group on X then Hom$(F_X,H)=H^X$ is a group, functorially in H, hence $F_X$ has a cogroup object structure. As functions between sets induce re-indexing functions: $H^X \rightarrow H^Y$ that are indeed group morphisms, morphisms between sets indeed are cogroup morphisms.
Explicitly, $\mu:F_X \rightarrow F_X * F_X$ is the map that sends each generator $e_x$ to $e_x^L * e_x^R$, and $i$ is the map that sends each generators to its inverse.
An easy calculation shows that the generators are the only elements such that $\mu(y)=y^L*y^R$ and hence that any cogroup morphism comes from a function between sets. So the only co-group morphisms are the ones sending generators to generators.
And with a bit more work, as nicely explained on this other MO answer, one can check that any cogroup object is of this form.
Now, as all this is a theorem of $\sf{ETCS}$, it is a theorem of $\sf{ETCG}$ that all the axioms (and theorems) of $\sf{ETCS}$ are satisfied by the category of cogroup objects in any model of $\sf{ETCG}$, which gives you the desired bi-interpretability between $\sf{ETCS}$ and $\sf{ETCG}$. Adding supplementary axioms to $\sf{ETCS}$ (like R) does not change anything.
In fact, one has more than bi-interpretability: the two theories are equivalent in the sense that there is an equivalence between their models. But one has a lot better:
Corollary: Given $T$ a model of $\sf{ETCS}$, then $Grp(T)$ is a model of $\sf{ETCG}$. Given $A$ a model of $\sf{ETCG}$, then $CoGrp(A)$ is a model of $\sf{ETCS}$. Moreover these two constructions are inverse to each other up to equivalence of categories.
Edit: this an answer to a question of Matt F. in the comment to give explicit example of how axioms and theorems of $\sf{ECTS}$ translate into $\sf{ECTG}$.
So in $\sf{ECTS}$ there is a theorem (maybe an axioms) that given a monomorphism $S \rightarrow T$ there exists an object $R$ such that $T \simeq S \coprod R$.
In $\sf{ECTG}$ this can be translated as: given $T$ a cogroup object and $S \rightarrow T$ a cogroup monomorphism* then there exists a co-group $R$ such that $T \simeq S * R$ as co-groups**.
*: It is also a theorem of $\sf{ECTG}$ that a map between cogroup is a monomorphism of cogroup if and only if the underlying map of objects is a monomorphisms. Indeed that is something you can prove for the category of groups in $\sf{ECTS}$ so it holds in $\sf{ECTG}$ by definition.
** : We can prove in $\sf{ECTG}$ (either directly because this actually holds in any category, or proving it for group in $\sf{ECTS}$) that the coproduct of two co-group objects has a canonical co-group structure which makes it the coproduct in the category of co-groups.
Edit 2: To clarify that the category of cogroup is defined purely in the categorical language:
The coproduct in group is the free product $G * G$ and is definable by its usual universal property.
A cogroup is then an object (here a group) equipped with a map $\mu: G \rightarrow G * G$ which is co-associative, that is $\mu \circ (\mu * Id_G) = \mu \circ (Id_G * \mu)$, and counital (the co-unit has to be the unique map $G \rightarrow 1$), that is $(Id_G,0) \circ \mu = Id_G$ and $(0,Id_G) \circ \mu = Id_G$, where $(f,g)$ denotes the map $G * G \rightarrow G$ which is $f$ on the first component and $g$ on the other component, as well as an inverse map $i:G \rightarrow G$ such that $(Id_G ,i ) \circ \mu = 0 $. Morphisms of co-groups are the map $f:G \rightarrow H$ that are compatible with all these structures, so mostly such that $ (f * f) \circ \mu_H = \mu_G \circ f $.
If you have doubt related to the "choice" of the object $G * G$ (which is only defined up to unique isomorphisms) a way to lift them is to define "a co-group object" as a triple of object $G,G *G,G * G *G$ with appropriate map between them satisfying a bunch of confition (includings the universal property) and morphisms of co-group as triple of maps satisfying all the expected conditions. This gives an equivalent category.