The usual approach is to define a concept of a first order language $\mathcal{L}$. They are usually specified by the nonlogical symbols. Well-formed formulas in the language $\mathfrak{L}$ are strings of symbols of $\mathfrak{L}$ along with the logical symbols such as $($, $)$, $\wedge$, $\neg$, variables etc. You can look up in a logic textbook the inductive definition of well-formed formulas, but something like $x \wedge y$ is a well-formed formula, but $(()\neg\wedge xy \neg$ is not a well-formed formula.
A first order theory $T$ in the language $\mathfrak{L}$ is then a collection of well-formed sentences (no free variable) in the language $\mathfrak{L}$. You would then define the deduce relation $T \vdash \varphi$ to mean that there exists a proof of $\varphi$ using $T$. A proof is just a string of of sentences $\phi_1, ..., \phi_n$ such that $\phi_n = \varphi$, each $\phi_i$ is in $T$, a logical axiom of first order logic, follows from modus ponen or generalization using previous $\phi_j$, where $j < i$.
So the above is the definition of a arbitrary first order theory in an arbitrary first order language $\mathfrak{L}$. Now let $\mathfrak{L} = \{\in\}$ a first order language consisting a single binary relation. $ZFC$ is then the first order theory in the language $\mathfrak{L}$ consisting of the "eight axioms" you mentioned above. (Note that ZFC has infinitely may axioms. For example, the axiom schema of specification is actually one axiom for each formula.)
The benefit of this approach where the general definition of first order logic is developed first is that you apply this to study first order logic in general and other first order theories such that the theory of groups, rings, vector space, random graphs, etc. Also first order logic is developed in the metatheory. That is for example, a theorem of ZFC (even if it is about infinite cardinals greater than $\aleph_1$) has a finite proof in the metatheory. However, within ZFC you can formalize first order logic. Then you can consider question about whether $ZFC$ can prove it own consistency.
By taking the approach of developing first order theories in general, you also gain a certain perspective. Some people think that ZFC is something special since it can serve as a foundation for much of mathematics. Through this approach, $ZFC$ is really just another first order theory in a very simple language consisting of a single non-logical symbol. People often have a hard time grasping the idea that $ZFC$ can have different models, for instance one where the continuum hypothesis holds and one where it does not. However, almost everyone would agree that that there exists more than one model of group theory (i.e. more than one group). Sometimes it is helpful to know that results about arbitrary first order theory still apply when one is working in ZFC set theory.
Quine's Methods of Logic is a good example of a treatment that makes virtually no reference, even informal, to sets. His treatment is more to informally talk about strings and the truth functional connectives that form new strings; basically, he covers the things that go into the definition of a well-formed formula, but doesn't need to go into talk of the set of wff's.
The trade off here is that while it's easy to talk about proofs like this, the few occasions on which he discusses model theoretic ideas are very sketchy and informal. I never really found his sections on the soundness and completeness of his proof methods informative. So this "set-free" approach is good for learning how to do proofs, but isn't good for particularly deep insights into logic as a formal system; I see this as a feature as much as it's a bug, personally.
Edit: In light of Carl Mummert's comment and answer, I thought I should clarify that I use Quine as an example here; see Carl's comment below for reasons this might not be a great textbook for a mathematics student. While I do consider a treatment of logic in terms of strings and formation rules to be helpful in justifying first order logic's use in a foundational setting, I think Carl's answer also highlights why one can safely be indifferent to the use of mild set-talk.
Best Answer
(1) This is actually not a problem in the form you have stated it -- the rules of what is a valid proof in first-order logic can be stated without any reference to sets, such as by speaking purely about operations on concrete strings of symbols, or by arithmetization with Gödel numbers.
However, if you want to do model theory on your first-order theory you need sets. And even if you take the syntactical viewpoint and say that it is all just strings, that just pushes the fundamental problem down a level, because how can we then formalize reasoning about natural numbers (or symbol strings) if first-order logic itself "depends on" natural numbers (or symbol strings)?
The answer to that is that is just how it is -- the formalization of first-order logic is not really the ultimate basis for all of mathematics, but a mathematical model of mathematical reasoning itself. The model is not the thing, and mathematical reasoning is ultimately not really a formal theory, but something we do because we intuitively believe that it works.
(2) This is a misunderstanding. In axiomatic set theory, the axioms themselves are the definition of the notion of a set: A set is whatever behaves like the axioms say sets behave.
(3) What you quote is how functions usually are modeled in set theory. Again, the model is not the thing, and just because we can create a model of our abstract concept of functional relation in set theory, it doesn't mean that our abstract concept an sich is necessarily a creature of set theory. Logic has its own way of modeling functional relations, namely by writing down syntactic rules for how they must behave -- this is less expressive but sufficient for logic's need, and is no less valid as a model of functional relations than the set-theoretic model is.