My personal opinion is that one should consider the 2-category of categories, rather than the 1-category of categories. I think the axioms one wants for such an "ET2CC" will be something like:
- Firstly, some exactness axioms amounting to its being a "2-pretopos" in the sense I described here: http://ncatlab.org/michaelshulman/show/2-categorical+logic . This gives you an "internal logic" like that of an ordinary (pre)topos.
- Secondly, the existence of certain exponentials (this is optional).
- Thirdly, the existence of a "classifying discrete opfibration" $el\to set$ in the sense introduced by Mark Weber ("Yoneda structures from 2-toposes") which serves as "the category of sets," and internally satisfies some suitable axioms.
- Finally, a "well-pointedness" axiom saying that the terminal object is a generator, as is the case one level down with in ETCS. This is what says you have a 2-category of categories, rather than (for instance) a 2-category of stacks.
Once you have all this, you can use finite 2-categorical limits and the "internal logic" to construct all the usual concrete categories out of the object "set". For instance, "set" has finite products internally, which means that the morphisms $set \to 1$ and $set \to set \times set$ have right adjoints in our 2-category Cat (i.e. "set" is a "cartesian object" in Cat). The composite $set \to set\times set \to set$ of the diagonal with the "binary products" morphism is the "functor" which, intuitively, takes a set $A$ to the set $A\times A$. Now the 2-categorical limit called an "inserter" applied to this composite and the identity of "set" can be considered "the category of sets $A$ equipped with a function $A\times A\to A$," i.e. the category of magmas.
Now we have a forgetful functor $magma \to set$, and also a functor $magma \to set$ which takes a magma to the triple product $A\times A \times A$, and there are two 2-cells relating these constructed from two different composites of the inserter 2-cell defining the category of magmas. The "equifier" (another 2-categorical limit) of these 2-cells it makes sense to call "the category of semigroups" (sets with an associative binary operation). Proceeding in this way we can construct the categories of monoids, groups, abelian groups, and eventually rings.
A more direct way to describe the category of rings with a universal property is as follows. Since $set$ is a cartesian object, each hom-category $Cat(X,set)$ has finite products, so we can define the category $ring(Cat(X,set))$ of rings internal to it. Then the category $ring$ is equipped with a forgetful functor $ring \to set$ which has the structure of a ring in $Cat(ring,set)$, and which is universal in the sense that we have a natural equivalence $ring(Cat(X,set)) \simeq Cat(X,ring)$. The above construction then just shows that such a representing object exists whenever Cat has suitable finitary structure.
One can hope for a similar elementary theory of the 3-category of 2-categories, and so on up the ladder, but it's not as clear to me yet what the appropriate exactness properties will be.
I have to admit that this is not really an answer, but rather some sort of meta-answer with some very general remarks which I hope do not bore everyone reading this; it just seems to me that this is necessary to indicate that it is rather misguided, as Yemon already says in the comments and I strongly agree with, to ask such a question if some book introduces elementary number by means of category theory.
Mathematics is all about the nontrivial, unexpected relationships. Category Theory is not really about finding such relationships, but rather about the correct setting, language and color some theory is developed. This point of view does not really contradict the hitherto development of category theory into a huge area of mathematics in its own right, full of nontrivial deep theorems; namely because often there is some geometric or whatever background which is our real motiviation. There are ubiquitous examples (model categories, topoi, stacks, $\infty$-categories, ...) which I don't want to elaborate here.
Anyway, as I said, mathematics really starts when something unexpected happens, which does not follow from general category theory. For example, the covariant functor $\hom(X,-)$ is always continuous, but when is it also cocontinuous, or respects at least filtered colimits? It turns out that this leads to a natural finiteness condition on $X$, namely we call $X$ then finitely presented. But finally to arrive at the question, $\mathbb{Z}$ is easily seen to be a inital object in the category of rings, but what theorems from category theory are known about initial objects? Well there is nothing to say, expect that every two initial objects are canonical isomorphic, which is just a trivial consequence of the definition. So $\hom(\mathbb{Z},-)$ is easy to describe, but what about the contravariant functor $\hom(-,\mathbb{Z})$? What happens when you plug in $\mathbb{Z}[x,y,z]/(x^n+y^n=z^n)$ for some fixed $n>2$? Does category theory help you to understand this? This example also shows that although the Yoneda-Lemma says that an object $X$ of a category is determined by its functor $\hom(X,-)$, it does not say you anything about the relationship of $X$ with other objects, for example when we just reverse the arrows. Instead, we have to use a specific incarnation of the category and its objects in order derive something which was not there just by abstract nonsense.
Perhaps related questions are more interesting: Which investigations in elementary number theory have led to some category theory (for example, via categorification), which was then applied to other categories as well, thus establishing nontrivial analogies? Or for the other direction, which general concepts become interesting in elementary number theory after some process of decategorification? But in any case, it should be understood that you have to digest elementary number theory before that ...
Best Answer
There are several articles that I wrote on ETCS, which had originally appeared on the (currently inactive) blog Topological Musings. The nLab articles are nothing more than transcriptions of what I had written into MathML, which is what we use at the nLab. They stop a little short of what you are asking for specifically, so perhaps I can fill the gap now, and say how I think I might have proceeded.
As already mentioned by David and Sridhar, ETCS differs from traditional set theories that are based on a global membership relation (theories whose underlying signature consists of a single binary relation $\in$). Instead, ETCS spells out axioms that one expects to hold for a category of sets and functions. For those who speak the language, the axioms amount to saying that a model of ETCS is a topos with a natural numbers object, such that the terminal set is a generator and the axiom of choice ("epis split") holds.
In this framework, one treats "union" as an operation which internalizes the external operation of taking joins in subset lattices. Thus, if $X$ is a set (or an object if you like), the union operation relative to $X$ is an appropriate morphism
$$\bigcup: PPX \to PX$$
where $PX$ denotes the power set/object of $X$. By the universal property of power objects, this morphism corresponds to a subobject of $X \times PPX$. This subobject is specified by the formula (of an internal language for toposes)
$$\exists_{A: PX} (x \in_X A) \wedge (A \in_{PX} C)$$
where $x$ is of type $X$ and $C$ is of type $PPX$.
There are several ways of doing this, even if one is not familiar with the internal language of a topos. One way, which works for general toposes, proceeds by interpreting the quantifier $\exists_{A: PX}$ directly in terms of image factorizations. Namely, consider the image factorization of the composite
$$[(x \in_X A) \wedge (A \in_{PX} C)] \hookrightarrow X \times PX \times PPX \stackrel{proj}{\to} X \times PPX$$
to get the desired subobject $I \hookrightarrow X \times PPX$. (Of course, this requires that one construct image factorizations in a topos, as treated in any standard text.) The subobject described in brackets is, in turn, a pullback of the form
$$(1_X \times \delta \times 1_{PPX})^\ast(\in_X \times \in_{PX})$$
where $\in_X \hookrightarrow X \times PX$ and $\in_{PX} \hookrightarrow PX \times PPX$ are the canonical subobjects, and where $\delta: PX \to PX \times PX$ is the diagonal. Then, as said before, the map $PPX \to PX$ which classifies this image $I \hookrightarrow X \times PPX$ is the desired internal union relative to $X$.
The second way to go is to realize that a model of ETCS is in particular a Boolean topos. Then, if one has already constructed universal quantification (see for instance the second of the three articles in the ETCS series), one can easily interpret the formula
$$\neg \forall_{A: PX} (x \in_X A) \Rightarrow \neg(A \in_{PX} C)$$
once one has defined internal negation, which is not difficult. This circumvents the need to first construct images, but only works in the Boolean case.
However one spells out the details, the larger point is that in ETCS, membership relations are local and relative to objects $X$, in the form of universal subobjects $\in_X \hookrightarrow X \times PX$, as opposed to being given by a single global relation $\in$ that obtains on the class of objects. Correspondingly, set-theoretic operations like union and intersection are also local and relative in this sense. Otherwise, the first-order formulas that specify such operations -- the ones we all know and love -- work pretty much the same way; in ETCS, the relevant operations may be constructed by clever exploitation of universal properties of relations $\in_X$, and not just asserted to exist by way of a comprehension or separation axiom scheme.