(1) This is actually not a problem in the form you have stated it -- the rules of what is a valid proof in first-order logic can be stated without any reference to sets, such as by speaking purely about operations on concrete strings of symbols, or by arithmetization with Gödel numbers.
However, if you want to do model theory on your first-order theory you need sets. And even if you take the syntactical viewpoint and say that it is all just strings, that just pushes the fundamental problem down a level, because how can we then formalize reasoning about natural numbers (or symbol strings) if first-order logic itself "depends on" natural numbers (or symbol strings)?
The answer to that is that is just how it is -- the formalization of first-order logic is not really the ultimate basis for all of mathematics, but a mathematical model of mathematical reasoning itself. The model is not the thing, and mathematical reasoning is ultimately not really a formal theory, but something we do because we intuitively believe that it works.
(2) This is a misunderstanding. In axiomatic set theory, the axioms themselves are the definition of the notion of a set: A set is whatever behaves like the axioms say sets behave.
(3) What you quote is how functions usually are modeled in set theory. Again, the model is not the thing, and just because we can create a model of our abstract concept of functional relation in set theory, it doesn't mean that our abstract concept an sich is necessarily a creature of set theory. Logic has its own way of modeling functional relations, namely by writing down syntactic rules for how they must behave -- this is less expressive but sufficient for logic's need, and is no less valid as a model of functional relations than the set-theoretic model is.
Zermelo–Fraenkel set theory is formulated in a language that has a binary relation $\in$, and that's it. The symbols $1,2,3,4$ and $+$ make "no sense" from a syntactic point of view.
So first one needs to explain what is $2$ in the context of set theory. This means that you need to define some sets that will behave like you would expect of the natural numbers. Normally, these are the finite von Neumann ordinals, defined recursively as $0=\varnothing$ and $n+1=n\cup\{n\}$. There are other ways to define the natural numbers, or you could be thinking about the real numbers, and then you need to define those as well first, but for now let's stick with the von Neumann ordinals and see where it takes us.
Note that even in the case of Peano arithmetic or ring theory, the symbol for $2$ and the symbol for $4$ are not part of the language. They are used as shorthand for either repeated sums of $1$ or repeated application of the successor function to $0$, or so on.
So if $2$ is defined as $1+1$ and $4$ is defined as $((1+1)+1)+1$, and you have an axiom saying that $(x+y)+z=x+(y+z)$, then you're practically done: $(1+1)+(1+1)=((1+1)+1)+1$, so $2+2=4$. But okay, let's go back to set theory.
We have the natural numbers, so $2=\{\varnothing,\{\varnothing\}\}$ and $4$, well, let's just write $4$. After this you need to ask what is $+$. Do you think about $+$ as a recursive definition of applying the successor function? Even the Peano axioms which usually form the basis for arithmetic have $+$ as a standalone object, and there is a connection between $+$ and the successor function. There are two standard ways to define what is $+$ on the natural numbers in the context of set theory:
Successive application of the successor function, so $x+0=x$ and $x+S(y)=S(x+y)$. In that case, $2+2=S(1)+S(1)=S(S(0))+S(S(0))=\dots=S(S(S(S(0))))$. And now we can also translate back into sets to get "a fully set theoretic statement". But it's awful, and I don't want to do it.
Using cardinality of sets. Note that $n$ is a set with exactly $n$ elements. $0$ has no elements and $1$ has exactly one element (which is $0$, as it turns out). So we can define $n+m=k$ if and only if the number of elements in a disjoint union of a copy of $n$ and a copy of $m$, is $k$. Or, in modern terms, there is a bijection between the two sets.
In that case, one needs to write down a function from $\{\langle 0,0\rangle,\langle 0,1\rangle\}\cup\{\langle 1,0\rangle,\langle 1,1\rangle\}$, or $\{0\}\times2\cup\{1\}\times2$, and $4$ which is $\{0,1,2,3\}$. Of course, this starts with defining what is an ordered pair from a set theoretic view, what is a function, etc., and then write this as a set theoretic expression again. Which, as before, is an awful exercise in futility.
The easiest way, at the end, is to prove this "by blocks". First prove there is a way to formalize the natural numbers, then formalize addition, then show that no matter what formalization you chose, as long as it satisfies certain properties (which you would expect from the natural numbers), it has to be the case that the object corresponding to $2+2$ is the object corresponding to $4$.
Or just be annoying and declare that you interpret $2,+$ and $4$ in a non-standard way so $2+2=5\neq 4$. Whatever floats your boat.
Best Answer
The usual approach is to define a concept of a first order language $\mathcal{L}$. They are usually specified by the nonlogical symbols. Well-formed formulas in the language $\mathfrak{L}$ are strings of symbols of $\mathfrak{L}$ along with the logical symbols such as $($, $)$, $\wedge$, $\neg$, variables etc. You can look up in a logic textbook the inductive definition of well-formed formulas, but something like $x \wedge y$ is a well-formed formula, but $(()\neg\wedge xy \neg$ is not a well-formed formula.
A first order theory $T$ in the language $\mathfrak{L}$ is then a collection of well-formed sentences (no free variable) in the language $\mathfrak{L}$. You would then define the deduce relation $T \vdash \varphi$ to mean that there exists a proof of $\varphi$ using $T$. A proof is just a string of of sentences $\phi_1, ..., \phi_n$ such that $\phi_n = \varphi$, each $\phi_i$ is in $T$, a logical axiom of first order logic, follows from modus ponen or generalization using previous $\phi_j$, where $j < i$.
So the above is the definition of a arbitrary first order theory in an arbitrary first order language $\mathfrak{L}$. Now let $\mathfrak{L} = \{\in\}$ a first order language consisting a single binary relation. $ZFC$ is then the first order theory in the language $\mathfrak{L}$ consisting of the "eight axioms" you mentioned above. (Note that ZFC has infinitely may axioms. For example, the axiom schema of specification is actually one axiom for each formula.)
The benefit of this approach where the general definition of first order logic is developed first is that you apply this to study first order logic in general and other first order theories such that the theory of groups, rings, vector space, random graphs, etc. Also first order logic is developed in the metatheory. That is for example, a theorem of ZFC (even if it is about infinite cardinals greater than $\aleph_1$) has a finite proof in the metatheory. However, within ZFC you can formalize first order logic. Then you can consider question about whether $ZFC$ can prove it own consistency.
By taking the approach of developing first order theories in general, you also gain a certain perspective. Some people think that ZFC is something special since it can serve as a foundation for much of mathematics. Through this approach, $ZFC$ is really just another first order theory in a very simple language consisting of a single non-logical symbol. People often have a hard time grasping the idea that $ZFC$ can have different models, for instance one where the continuum hypothesis holds and one where it does not. However, almost everyone would agree that that there exists more than one model of group theory (i.e. more than one group). Sometimes it is helpful to know that results about arbitrary first order theory still apply when one is working in ZFC set theory.