Well. That depends on whom you might ask this.
Set theory might be inconsistent. In particular $\sf ZFC$ and its extension by large cardinal axioms. It's a nontrivial thing, to feel safe with these theories, and it takes a lot of practice and time until you understand that $\sf ZFC$ is self-evident to some extent, and [some] large cardinal axioms are somewhat self-evident as well.
But of course, we might be tricked. Crooks have been known to seem honest, until you're left with a big heap of nothing in your hands. That is why Nigerian royalty is going to have a very hard time emailing people around the world.
If that happens, we need to ask ourselves where the problem lies. Is it in one of the axioms, or maybe specifically in the existence of infinite sets? Will a slightly weaker set theory (e.g. $\sf ZC$) work better, or maybe we have to resort to arithmetic theories to fix things?
Our grasp of things is usually largely inconsistent,1 but we do like to think in "types". So working inside different categories when needed, or working with some type theory or another. A lot, and I mean a lot, of people will twitch when you ask them what are the elements of $\pi$. Or whether or not $\frac13$ is a subset of $e$.
Of course, those who have a firm understanding of this know that this is a question of implementation, and this is like asking whether or not the machine code of one implementation of an algorithm is the same or different of another. But people don't think about it this way, although in a way they kinda do.
Instead, people focus on their math, and they just remember (or more accurately: they don't) that you can formalize this in terms of set inside set theory.
$\sf ZFC$ is incomplete. That doesn't sound like a big deal. But it is, in a deep sense. We might half expect a foundational theory to be able and decide all sort of things about the mathematical universe. Enough so most, or any, questions we might have can be answered there. Again, those who study foundations long enough should realize this is not the goal of a foundational theory, but it is somewhat expected.
And some people feel very awkward with this incompleteness. They will also have a problem with arithmetic based foundations, since that is an incomplete foundation also. But it's true that in $\sf ZFC$ those incompleteness phenomenon are far more gaping, and as soon as you leave the term "countable" and enter the "uncountable", more or less everything becomes open and independent.
This poses an actual problem. Imagine that half the functional analysts would work in one set theory (say $V=L$), and the other half in another (say $\sf ZFC+PFA$). It would create some strange discrepancies which will eventually tear the field up. And the same goes to everything else, except set theory, whose main occupation is these different axioms.
But that's just counting three strikes against $\sf ZFC$. And I cannot, in good conscience finish my post like this. So let's balance things as to why $\sf ZFC$ is a good foundation.
Set theory, and in particular $\sf ZFC$ is quite self evident. When I was a starting masters student, I attended a course by one of the generation's greatest mathematician and asked him about the axioms of $\sf ZFC$ and he said that a good foundational theory is one whose axioms you don't feel you're using. It might be that we're trained to think in "set theory ways", but it's also true that the axioms give you exactly the expressive power to talk about everything you care about and not care about one specific implementation over another (looking your way, Replacement axioms!), which is quite great.
Can you imagine a mathematical universe where the reals constructed with Dedekind cuts and with Cauchy sequences are different? (Those mathematical worlds exists in other foundations, by the way, and at least to me, that is weird.)
The previous point leads us exactly to this argument. Humans take an abstract algorithm, implement it in various languages, on various processors, using various data structures and types. But the code and data all turn into electronic signals. So working with high level objects like $\Bbb N$ and $\Bbb R$, and function spaces and so on is our algorithms. And we can turn that into electronic signals, or sets and first-order structures in this case.
We like sets more than we are willing to admit. One of the problems of second-order logic is that the logic is incomplete.2 And if you have an algorithm for a list of inference rules, to verify a proof, then there will be statements which are true (even valid) that these inference rules cannot prove. So your proof theory is quite insufficient in this aspect. Because at least the valid sentences should be provable.
So we can, instead, use a first-order based foundation to fix this. Instead of proving something about $\Bbb R$, we prove that $\sf ZFC$ proves that thing about $\Bbb R$; and this we can verify mechanically. So one option is to resort to some arithmetic foundation. But this causes a different problem. We cannot "take objects" anymore. We don't have sets of reals, or rings, or groups. So when you prove something about groups, you can't say "Let $G$ be an abelian group, then bla bla bla" anymore. You have to say "The theory of abelian groups proves that bla bla bla".
And this is quite a big issue, since we think about mathematics in some material sense. The objects exists somewhere. They are not just axiomatic consequences. And set theory, in particular $\sf ZFC$ with its implementation agnosticism, provides us with the means to do exactly this.
Footnotes.
(1) Is $\Bbb N\subseteq\Bbb R$? Often the answer is yes, many other times the answer is no. And it really depends on what you want to do, or how you mostly use these two objects. But since we know that either method can be replaced by the other we're not worried about it too much. But it is a concrete question that has convenient answer either way, and we exploit it. And that, in a nutshell, a "global inconsistency" in our thinking.
You can argue about this, but it's going to be besides the point.
(2) Incomplete here means in the sense of the completeness theorem. Something true in every model need not be provable. This is a different kind of incompleteness than the one referred to earlier, where $\sf ZFC$ is incomplete in the sense that it does not prove or refute every statement.
The simplest answer is that he is using certain objects such as natural numbers and functions via their abstracted properties (called interfaces in programming languages), instead of pinning down any specific objects that have those properties (called implementations in programming languages). The reason is that we usually care about the interfaces, and rarely about how exactly they are implemented.
Remember that usually mathematical objects do not come alone but as members of some structure, and the properties we want them to have is better understood as properties of the structure as a single whole. That also means that we are free to change the structure as long as it satisfies the same properties.
The typical example is the natural numbers. Almost all the time we only care about the properties that they have in relation to one another, like $0 < 1 < 2 < 3 < \cdots$, and induction over the natural numbers. These properties are all required to hold regardless of how the collection of natural numbers is actually formed. ZFC has many ways of exhibiting a model of PA, and in fact Zermelo's original construction of one of these models was different from the now standard von Neumann construction. The reason we use the latter now is one of the rare cases where we find that it is more convenient to have the natural numbers be a transitive set, so that they are von Neumann ordinals. There are technical advantages to that, because we can use the inbuilt symbol $\in$ for the ordering on ordinals. One can say that we like the structure of natural numbers to internally satisfy PA and externally be a substructure of the structure of ordinals, so there is special reason to privilege the von Neumann construction of natural numbers.
Another typical example is the real numbers. This time we never in practice care how they are constructed. As long as they satisfy the field properties and the second-order completeness axiom, that is good enough. One can say that we only need the structure of the reals to satisfy internal properties, and we don't care about any external properties. So whether use choose the Cauchy sequences of rationals or Dedekind cuts of the rationals or decimal sequences, it doesn't matter at all. The only point at which it matters is the when you want to construct the structure to show that there is some instance satisfying the desired properties, where in some weak systems some of these are not possible to construct. Thereafter we use real numbers only via their interface.
Similarly for functions. In ZFC they can be encoded as special kinds of subsets of the cartesian product of the domain and codomain, but in practice we don't care, and sometimes don't want that restriction, when dealing with class functions for example. You see, you think of the powerset symbol $\mathcal{P}$ as a function, don't you? But it can't be a set in ZFC unless you want a contradiction. Same for the cardinality map, and in general for any function-like thing whose domain or codomain is not a set. One could work in the strictly stronger MK set theory to avoid that issue, but the point remains that we don't really care how functions are implemented as long as we can use them the way we expect them to function.
To ensure that the mathematical structures one uses can be constructed in ZFC, one just has to make sure that either there is a set in ZFC that satisfies the desired interface (such as for natural numbers and reals), or there is a syntactic translation from one's proof to a proof in ZFC (such as proofs using class functions, which often but not always can be translated systematically into proofs that work in ZFC, simply by replacing the use of those functions by formulae), or that one is working in a conservative extension of ZFC (such as using full abbreviation power).
Best Answer
History, unfortunately, is more complicated than your choices permit.
Presumably, Zermelo had proposed his 1908 axioms in support of his well ordering theorem. In so far as his axioms might relate to intuitive notions of set, his original axioms admitted urelements and relied on singletons to substantiate the truth of equality statements. The idea that a theory of pure sets suffice for mathematics is attributed to Skolem as far as I know. With a little further consideration, this may be seen as assuming the arithmetization of mathematics. As such, one must recognize the chicken and egg problem that arises if linguistic symbols that are not pure sets become conflated with some Goedellian encoding for the sake of claiming that everything is a set.
Zermelo's reliance on singletons reflects the relationship of his system with Frege's logic. Category theorists claim that Zermelo had to make s choice between differing views expressed by Frege and Cantor. Since the relevant text is not translated into English, I cannot judge this claim. Nevertheless, Frege speaks of object identity and concept identity in a circular fashion that arises from Leibniz' principle of the identity of indiscernibles. Zermelo's use of singletons ought to be understood in this context. Foundationalism in the sense of Russell's logical atomism stipulates that object identity is fundamental. Hence, one has "first-order logic" with the proviso that "identity" is a "second-order" concept.
Since modern accounts of Zermelo-Fraenkel set theory are now based upon the first-order paradigm and the presumption that a theory of pure sets suffice for mathematics, urelements are treated in a derivative theory often called ZFA where the 'A' stands for (logical) atoms. The theory assumes that the urelements form a set (in conformance with the "everything is a set" presupposition). And, it is an important theory because of the models that arise from permuting the urelements.
It is, however, entirely plausible to understand urelements with respect to an antichain relation in a different signature. Among other things to be found in Russell's writings, is the question of how a set is comparable to an object. While no one would ask that particular question today, the fact that urelements are not sets means that an antichain whose denial yields an order based on membership is entirely conceivable.
Such a theory would permit one to assert how non-sets satisfy the real number axioms, for example.
Understanding the distinction between urelements and sets in this manner permits one to understand how Cohen's forcing can extend a given model with "ideal subsets." The axiom of extension is contingent upon the denial of an antichain.
But, since such a view is not of interest to established mathematicians, students of set theory need to focus on the affirmative claims of first-order logic and set theory without concern for "intuition."
With regard to your second question, let me suggest that you look at the papers by Augustus de Morgan found in Ewald's two-volume anthology, "From Kant to Hilbert. " This will provide you with a good foundation for understanding what the assertion, "mathematics is formal," really means. The received view corresponding with first-order model theory reduces mathematics to language signatures, stipulation of relations and operations involving the symbols of such signatures, and a specification of the arities for each symbol in the signature. These constructs are often called "similarity types."
Of course, the "first-order" designation goes back to the foundationalist stipulation that object identity is fundamental. So, language signatures will contain a symbol denoting an extensional domain of discourse. Extension is important to mathematical practice so that linguistic expressions intended to have "the same meaning" denote "singularly."
It had been Skolem's criticism of Zermelo's theory that gave impetus to this use of language signatures in mathematics over the notion that "mathematics is extensional." This particular claim arises from a different history. It is, of course, important for model theory. But, in the model theory of set theory it yields the "unfolding" and "geology" of possible models studied by set theorists using Cohen's forcing. What Skolem observed about Zermelo's axioms is that they could not be "categorical" in the sense of specifying a single, unique interpretation of the language symbols.