Russell's paradox
In Zermelo set theory, the proof of the titular question is straightforward:
- Assume there is such a set. Call it $R$.
- Fact: $x \notin x$ if and only if $x \in R$. This is the defining property of $R$.
- Assume $R \in R$.
- By the fact, this means $R \notin R$.
- Contradiction!
- Therefore $R \notin R$.
- By the fact, this means $R \in R$.
- Contradiction!
- Therefore no such set exists.
There is an immediate corollary: there is no set of all sets.
- Assume there is a set of all sets. Call it S.
- There is a subset $R \subseteq S$ containing exactly those sets $x$ for which that $x \notin x$
- Contradiction!
- Therefore, there is no set of all sets.
Rationale for Zermelo set theory
One of the most important features of a set theory is having tools to actually construct sets. Cantor's 'naive' set theory had the most powerful rule of all: if you could name any property $P$, then there was a set of all sets that have property $P$. This let you construct any set you could image! Unfortunately, it lets you construct the set of Russell's paradox, and thus Cantor's set theory is self contradictory.
Zermelo took a more modest approach*: he looked for a more conservative collection of constructions that sufficed for mathematics, but isn't so strong as to create any of the known paradoxical sets. Fraenkel added another useful construction, and gave us the axiom of foundation which simplifies technical arguments.
Among the constructions of Zermelo set theory is the restricted form of Cantor's "comprehension principle": if we have any property $P$ and a set $S$, then we can form the subset of $S$ of things satisfying property $P$.
The axiom of restricted comprehension exactly the property of a universe of sets that is needed to make the argument in the opening section.
*: I do not know if this is historically accurate. Really, I'm espousing an a postiori observation about it.
Classes
Set-builder notation is very useful notation to denote sets. Recall that each of the following notations define sets in ZFC:
$$ \{ x \in S \mid P(S) \} \qquad \qquad \{ f(x) \mid x \in S \} \qquad \qquad \{ a, b \} $$
where $a,b,S$ are all sets, $P$ is a unary predicate whose domain includes $S$, and $f$ is a function whose domain includes $S$.
The same notation turns out to be quite useful to define predicates. For example, predicate
P(x) = "x contains the empty set"
is easily notated as
$$ P = \{ x \mid \emptyset \in x \} $$
and the assertion that $x$ satisfies the predicate $P$ can be written as
$$ x \in P. $$
This notation, formally, has nothing to do with sets: it is alternative notation for logic. When we do this, we call a predicate a "class".
The way you manipulate logic in the form of classes is so strikingly similar to the way you manipulate sets that this unified notation is extremely useful.
To answer a question you had, the only objects are still sets. The only thing that can be a member of a set is a set. The only thing that can be a member of a class is a set. Classes can't be members of anything, because they aren't objects: they're logic. (at least, if we stick to first-order logic....)
It can be technically awkward when you hav0e to pay attention to what is a set and what is a class, especially if you want to reason in a 'stripped down' version of formal logic.
So, Von Neumann, Bernays, and Gödel invented (NBG) set theory*. The objects of NBG set theory are classes. It might be a little confusing to use the same word as we did for the alternative view of logic above; however in practice it's not a problem.
NBG set theory includes a class called $\mathbf{Set}$. $V$ is another commonly used name for this class. There is a theorem/axiom that says if $x \in y$, then $x \in \mathbf{Set}$.
NBG can also be presented (and usually is, I think) as a theory with two sorts: a sort of sets and a sort of classes. Only sets may be elements of things. But for any set there is a class that has the same elements, and it is reasonable to conflate the two.
*: Again, this is not meant to be a historically accurate presentation.
Universes
Another approach to dealing with classes is a Grothendieck universe. However, using them requires assuming a large cardinal axiom.
A Grothendieck universe is, briefly, a set $U$ with the property that the elements of $U$ collectively have good enough properties to be justifiably called a 'universe of sets'. We call the elements of $U$ "small sets". The things we would normally call classes are all subsets of $U$.
In this way (other than having had to assume a large cardinal axiom) we don't have to do much that is special -- everything we are talking about is a set. We just occasionally have to take note of which sets are "small" and which are not.
Best Answer
In standard set theory it is a fundamental truth (one of the instances of the axiom schema of separation) that
You seem to be thinking that the argument from Russell's paradox shows that this truth cannot hold when $A$ is the set of all sets and must therefore be modified to
But this is not actually necessary -- the original (1) works perfectly well, exactly because there is no set of all sets. If we try to let $A$ be a collection of all sets, then the premise "$A$ is a set" is simply false, and therefore it doesn't matter that the conclusion doesn't hold.
We don't usually state the assumption "when $A$ is a set" explicitly because it is implicit in working in set theory that everything we speak about is assumed to be sets. There's a similar hidden assumption in Cantor's theorem:
So if you try to apply this to $S$ satisfying $S=\mathcal P(S)$ and get the nonsensical conclusion that the identity function on $S$ cannot exist, then Cantor's theorem still works fine because the thing you're applying it to doesn't actually exist. If you pick the proof of Cantor's theorem apart in this case, you'll find that it uses something much like (1) as the crucial step, indicating that it actually does depend on the assumption "$A$ is a set".
So what you have found is essentially just two ways of phrasing the same argument that there is no set of all sets. They are not in conflict with each other.